对etcd集群及kubernetes集群进行升级

我的etcd集群和kubernetes集群都是二进制安装的,所以升级主要就是替换二进制文件。
这里我将原版本为3.3.8的etcd集群升级到3.3.10版本,将原版本为v1.11.1的kubernetes集群升级到v1.13.0版本,而我这里的kubernetes集群使用keepalived+haproxy做了双master的高可用、负载均衡,所以并无集群下线之忧。

升级 Etcd 集群

升级检查

查看集群健康状况

1
2
3
4
5
6
7
8
# ETCDCTL_API=3 
# etcdctl --endpoints=https://192.168.100.181:2379 cluster-health
member 3a406a85e3de7ef5 is healthy: got healthy result from https://192.168.100.184:2379
member 695714eeb38cebbe is healthy: got healthy result from https://192.168.100.181:2379
member ab8f0f710ce0bf85 is healthy: got healthy result from https://192.168.100.183:2379
member c5cb8024e23348b6 is healthy: got healthy result from https://192.168.100.182:2379
member ceb2db537a9ec20d is healthy: got healthy result from https://192.168.100.185:2379
cluster is healthy

查看版本

1
2
# curl https://192.168.100.181:2379/version
{"etcdserver":"3.3.8","etcdcluster":"3.3.0"}

使用快照备份 Etcd 集群

etcd leader拥有最新的应用程序数据,从leader获取快照
etcd_server_is_leader 是1即为leader,否则为0。

1
2
3
4
5
6
7
8
9
# curl -sL https://192.168.100.181:2379/metrics | grep etcd_server_is_leader
# HELP etcd_server_is_leader Whether or not this member is a leader. 1 if is, 0 otherwise.
# TYPE etcd_server_is_leader gauge
etcd_server_is_leader 0

# curl -sL https://192.168.100.182:2379/metrics | grep etcd_server_is_leader
# HELP etcd_server_is_leader Whether or not this member is a leader. 1 if is, 0 otherwise.
# TYPE etcd_server_is_leader gauge
etcd_server_is_leader 1

当然,也可以使用该命令查看谁是leader

1
2
3
4
5
6
# etcdctl --endpoints=https://192.168.100.181:2379 member list
3a406a85e3de7ef5: name=etcd-184 peerURLs=https://192.168.100.184:2380 clientURLs=https://192.168.100.184:2379 isLeader=false
695714eeb38cebbe: name=etcd-181 peerURLs=https://192.168.100.181:2380 clientURLs=https://192.168.100.181:2379 isLeader=false
ab8f0f710ce0bf85: name=etcd-183 peerURLs=https://192.168.100.183:2380 clientURLs=https://192.168.100.183:2379 isLeader=false
c5cb8024e23348b6: name=etcd-182 peerURLs=https://192.168.100.182:2380 clientURLs=https://192.168.100.182:2379 isLeader=true
ceb2db537a9ec20d: name=etcd-185 peerURLs=https://192.168.100.185:2380 clientURLs=https://192.168.100.185:2379 isLeader=false

使用快照备份集群

1
2
3
4
5
6
7
8
# ETCDCTL_API=3 etcdctl --endpoints https://192.168.100.182:2379 snapshot save snapshotdb
Snapshot saved at snapshotdb
# ETCDCTL_API=3 etcdctl --write-out=table snapshot status snapshotdb
+----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| c09e95e0 | 11794749 | 1226 | 19 MB |
+----------+----------+------------+------------+

下载并解压 Etcd

1
# tar -zxvf etcd-v3.3.10-linux-amd64.tar.gz

停止一个现有 Etcd 服务器

1
# systemctl stop etcd

替换 Etcd 二进制文件,使用相同配置重启 Etcd 服务器

1
2
3
4
5
6
7
8
9
10
11
12
# cp etcd-v3.3.10-linux-amd64/etcd /usr/bin/
# cp etcd-v3.3.10-linux-amd64/etcdctl /usr/bin/

# systemctl start etcd
# systemctl status etcd
# etcdctl --endpoints=https://192.168.100.181:2379 cluster-health
member 3a406a85e3de7ef5 is healthy: got healthy result from https://192.168.100.184:2379
member 695714eeb38cebbe is healthy: got healthy result from https://192.168.100.181:2379
member ab8f0f710ce0bf85 is healthy: got healthy result from https://192.168.100.183:2379
member c5cb8024e23348b6 is healthy: got healthy result from https://192.168.100.182:2379
member ceb2db537a9ec20d is healthy: got healthy result from https://192.168.100.185:2379
cluster is healthy

对其余成员重复如上步骤

在未升级的成员将记录以下警告,直到升级整个集群

1
2
3
# systemctl status etcd
the local etcd version 3.3.8 is not up-to-date
member 695714eeb38cebbe has a higher version 3.3.10

查看集群成员健康状况和版本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# etcdctl --endpoints=https://192.168.100.181:2379 cluster-health
member 3a406a85e3de7ef5 is healthy: got healthy result from https://192.168.100.184:2379
member 695714eeb38cebbe is healthy: got healthy result from https://192.168.100.181:2379
member ab8f0f710ce0bf85 is healthy: got healthy result from https://192.168.100.183:2379
member c5cb8024e23348b6 is healthy: got healthy result from https://192.168.100.182:2379
member ceb2db537a9ec20d is healthy: got healthy result from https://192.168.100.185:2379
cluster is healthy

# curl https://192.168.100.181:2379/version
{"etcdserver":"3.3.10","etcdcluster":"3.3.0"}
# curl https://192.168.100.182:2379/version
{"etcdserver":"3.3.10","etcdcluster":"3.3.0"}
# curl https://192.168.100.183:2379/version
{"etcdserver":"3.3.10","etcdcluster":"3.3.0"}
# curl https://192.168.100.184:2379/version
{"etcdserver":"3.3.10","etcdcluster":"3.3.0"}
# curl https://192.168.100.185:2379/version
{"etcdserver":"3.3.10","etcdcluster":"3.3.0"}

升级 Kubernetes 集群

查看当前集群版本

1
2
3
4
5
# kubectl get node 
NAME STATUS ROLES AGE VERSION
node01 Ready <none> 131d v1.11.1
node02 Ready <none> 131d v1.11.1
node03 Ready <none> 131d v1.11.1

下载并解压文件

1
2
# tar -zxvf kubernetes-server-linux-amd64.tar.gz
# cd kubernetes/server/bin

升级 Master 节点

停止 Master 节点相关组件

1
2
3
# systemctl stop kube-apiserver
# systemctl stop kube-controller-manager
# systemctl stop kube-scheduler

替换 Master 节点二进制组件

1
# cp kube-apiserver kube-controller-manager kube-scheduler kubeadm /usr/bin/

重新启用 Master 节点

1
2
3
4
5
6
7
8
# systemctl start kube-apiserver
# systemctl status kube-apiserver

# systemctl start kube-controller-manager
# systemctl status kube-controller-manager

# systemctl start kube-scheduler
# systemctl status kube-scheduler

在其他 Master 节点重复如上步骤进行升级

升级 Node 节点

标记节点为不可调度

设置为不可调度后,新的 pod 不会迁移或者部署在该节点

1
2
3
4
5
# kubectl cordon node01
node/node01 cordoned

# kubectl get node | grep node01
node01 Ready,SchedulingDisabled <none> 131d v1.11.1

迁移该节点的 Pod

迁移时注意系统瓶颈,当其他节点的CPU、内存或者本地存储资源不足,kubernetes都不会调用pod,pod会处于pending状态,直到重新上线该节点(或者扩容节点资源),pod才会重新上线。

1
2
3
4
5
6
7
# kubectl drain --ignore-daemonsets --delete-local-data node01
kubectl drain node01 --ignore-daemonsets --delete-local-data
node/node01 already cordoned
WARNING: Ignoring DaemonSet-managed pods: ......; Deleting pods with local storage: ......
pod/my-nginx-7ff9b54467-vk572 evicted
......
node/node01 evicted

注:对于DaemonSet-managed pods需要使用参数–ignore-daemonsets;
迁移使用本地存储的pods需要使用参数–delete-local-data(移动到其他节点将清空数据)。

查看节点上是否还存在 Pods(DaemonSet pods忽略)

1
# kubectl get pod -o wide --all-namespaces | grep node01

查看 Pods 是否已移动到其他节点

1
# kubectl get pod -o wide --all-namespaces

停用该节点 Kubelet 和 Kube-proxy

1
2
# systemctl stop kubelet
# systemctl stop kube-proxy

复制并替换相应二进制文件

1
2
# scp root@master1:/root/kubernetes/server/bin/kubelet /usr/bin/
# scp root@master1:/root/kubernetes/server/bin/kube-proxy /usr/bin/

启用该 Node 节点

1
2
3
4
# systemctl start kubelet
# systemctl status kubelet
# systemctl start kube-proxy
# systemctl status kube-proxy

在 Master 节点上解锁(重新上线)该 Node 节点

1
2
3
4
5
# kubectl uncordon node01
node/node01 uncordoned

# kubectl get node | grep node01
node01 Ready <none> 131d v1.13.0

在其他 Node 节点重复如上步骤以升级 Node 节点

查看系统是否升级成功

1
2
3
4
5
# kubectl get node 
NAME STATUS ROLES AGE VERSION
node01 Ready <none> 131d v1.13.0
node02 Ready <none> 131d v1.13.0
node03 Ready <none> 131d v1.13.0
ZhiJian wechat
欢迎您扫一扫上面的二维码,订阅我的微信公众号!
-------------本文结束,感谢您的阅读-------------