Kubeadm如果使用Containerd 前面的主机初始化步骤这里我不再赘述,可以参考 使用Kubeadm搭建一个高可用集群 文章中的初始化部分。我直接从Containerd的安装开始,这里我使用3台主机做演示。
IP
Hostname
role
CPU
Memory
172.16.50.200
k8s-master-01
master
4
8G
172.16.50.203
k8s-node-01
node
4
8G
172.16.50.204
k8s-node-02
nide
4
8G
升级系统内核 默认centos7.6内核版本是3.10.0-957.el7.x86_64
这个版本比较低,无法使用Cgroup v2
,实际在生产中我们使用默认的内核版本也是遇到过一些Bug,所以这里我会做内核版本升级,这个根据个人需求来做。当然你用默认的内核版本是没问题的。
升级内核需要使用 elrepo
的yum源,首先我们导入 elrepo
的 key并安装 elrepo
源
1 2 [root@k8s-master-01 ~] # rpm --import https:[root@k8s-master-01 ~] # rpm -Uvh https:
查看可用的内核 1 [root@k8s -master-01 ~]# yum --disablerepo="*" --enablerepo="elrepo-kernel" list available --showduplicates
内核选择
kernel-lt(lt=long-term)长期有效
kernel-ml(ml=mainline)主流版本
最新内核安装 1 [root@k8s -master-01 ~]# yum --enablerepo=elrepo-kernel install kernel-ml kernel-ml-devel
修改内核启动顺序,默认启动的顺序应该为1,升级以后内核是往前面插入,为0(如果每次启动时需要手动选择哪个内核,该步骤可以省略)
1 [root@k 8s-master-01 ~]# grub2-set -default 0 && grub2-mkconfig -o /etc/grub2.cfg
使用下面命令看看确认下是否启动默认内核指向上面安装的内核
1 [root@k8s -master-01 ~]# grubby --default-kernel
启用Cgroup v2 要启用 Cgroup v2
你可以通过在内核命令行中添加 systemd.unified_cgroup_hierarchy=1
来配置系统去使用它。 配置后必须重启节点,使参数生效。
1 2 3 4 yum -y install -y grubby grubby --update-kernel =ALL --args ="systemd.unified_cgroup_hierarchy=1" grubby --info =ALL reboot
安装 containerd 1 2 3 yum install -y yum-utils yum-config-manager --add-repo https:// download.docker.com/linux/ centos/docker-ce.repo yum -y install containerd.io
配置 containerd 1 2 mkdir -p /etc/ containerd containerd config default | sudo tee /etc/ containerd/config.toml
使用 systemd cgroup 驱动程序 结合 runc
使用 systemd
cgroup 驱动,在 /etc/containerd/config.toml
中设置
1 2 3 4 [plugins."io.containerd.grpc.v1.cri" .containerd.runtimes.runc ] ... [plugins."io.containerd.grpc.v1.cri" .containerd.runtimes.runc.options ] SystemdCgroup = true
重新启动 containerd 修改配置文件后请重新启动 containerd
使配置生效:
1 systemctl restart containerd
安装crictl crictl
是连接containerd的一个client端工具,用于管理containerd中的容器,这个工具比默认的ctr好用。crictl 使用 k8s.io
命名空间,kubernetes使用的镜像也是在这个名称空间下。
1 2 VERSION="v1.22.0" wget https://gi thub.com/kubernetes-sigs/ cri-tools/releases/ download/$VERSION/ crictl-$VERSION -linux-amd64.tar.gz
1 2 3 tar zxvf crictl-$VERSION -linux-amd64.tar.gz chown root.root crictlmv crictl /usr/bin/
配置crictl工具 编辑 /etc/crictl.yaml
1 2 3 runtime-endpoint: unix:// /run/ containerd/containerd.sock image-endpoint: unix:// /run/ containerd/containerd.sock timeout: 10
kubeadm部署 默认源在国外会无法安装,我们使用国内的镜像源,所有机器都要操作
1 2 3 4 5 6 7 cat <<EOF >/etc/yum.repos.d/kubernetes.repo [kubernetes]name =Kubernetesbaseurl =https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64enabled =1gpgcheck =0 EOF
master部分安装相关软件 1 2 3 4 5 6 yum install -y \ kubeadm-1 .22.4-0 \ kubectl-1 .22.4-0 \ kubelet-1 .22.4-0 \ --disableexcludes=kubernetes && \ systemctl enable kubelet
node部分安装相关软件 1 2 3 4 5 6 yum install -y \ kubeadm-1 .22.4-0 \ kubectl-1 .22.4-0 \ kubelet-1 .22.4-0 \ --disableexcludes=kubernetes && \ systemctl enable kubelet
kubeadm配置参数 打印默认init的配置信息(此命令仅在一台master节点上执行即可)
1 kubeadm config print init-defaults > initconfig.yaml
基于默认参数修改为如下配置,由于是演示环境所以我把etcd放在k8s集群内部。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 apiVersion: kubeadm.k8s.io/v1beta3 kind: InitConfiguration nodeRegistration: criSocket: unix:///var/run/containerd/containerd.sock --- apiVersion: kubeadm.k8s.io/v1beta3 kind: ClusterConfiguration imageRepository: registry.aliyuncs.com/k8sxio kubernetesVersion: v1.22.4 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes networking: dnsDomain: cluster.local serviceSubnet: 10.96 .0 .0 /12 podSubnet: 10.244 .0 .0 /16 controlPlaneEndpoint: 172.16 .50 .200 :6443 apiServer: timeoutForControlPlane: 4m0s extraArgs: authorization-mode: "Node,RBAC" enable-admission-plugins: "NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeClaimResize,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota,Priority" runtime-config: api/all=true storage-backend: etcd3 certSANs: - 127.0 .0 .1 - localhost - 172.16 .50 .200 - k8s-master-01 extraVolumes: - hostPath: /etc/localtime mountPath: /etc/localtime name: localtime readOnly: true controllerManager: extraArgs: bind-address: "0.0.0.0" extraVolumes: - hostPath: /etc/localtime mountPath: /etc/localtime name: localtime readOnly: true scheduler: extraArgs: bind-address: "0.0.0.0" extraVolumes: - hostPath: /etc/localtime mountPath: /etc/localtime name: localtime readOnly: true dns: type: CoreDNS etcd: local: dataDir: "/var/lib/etcd" --- apiVersion: kubeproxy.config.k8s.io/v1alpha1 kind: KubeProxyConfiguration mode: ipvs ipvs: excludeCIDRs: null minSyncPeriod: 0s scheduler: "rr" strictARP: false syncPeriod: 15s iptables: masqueradeAll: true masqueradeBit: 14 minSyncPeriod: 0s syncPeriod: 30s --- apiVersion: kubelet.config.k8s.io/v1beta1 kind: KubeletConfiguration cgroupDriver: systemd failSwapOn: true
至于如何自定义参数请参考官方文档:
1 2 3 https:// kubernetes.io/zh/ docs/setup/ production-environment/tools/ kubeadm/control-plane-flags/ https:// kubernetes.io/docs/ reference/config-api/ kubeadm-config.v1beta3/ https:// kubernetes.io/zh/ docs/setup/ production-environment/tools/ kubeadm/kubelet-integration/
干运行模式(此步骤会模拟运行看是否能够跑通)
1 kubeadm init --config initconfig.yaml --dry-run
检查镜像是否正确
1 kubeadm config images list --config initconfig.yaml
预先拉取镜像
1 2 3 4 5 6 7 8 kubeadm config images pull --config initconfig.yaml [config/images] Pulled registry.aliyuncs.com/k8sxio/kube-apiserver:v1.22.4 [config/images] Pulled registry.aliyuncs.com/k8sxio/kube-controller-manager:v1.22.4 [config/images] Pulled registry.aliyuncs.com/k8sxio/kube-scheduler:v1.22.4 [config/images] Pulled registry.aliyuncs.com/k8sxio/kube-proxy:v1.22.4 [config/images] Pulled registry.aliyuncs.com/k8sxio/pause:3.5 [config/images] Pulled registry.aliyuncs.com/k8sxio/etcd:3.5.0-0 failed to pull image "registry.aliyuncs.com/k8sxio/coredns:v1.8.4"
修改containerd pause地址 默认containerd配置文件中sandbox_image
地址我们是无法拉取的,所以我们需要修改为上面我们预拉取的pause镜像地址。
1 2 vim /etc/ containerd/config.toml sandbox_image = "registry.aliyuncs.com/k8sxio/pause:3.5"
kubeadm init 由于我上面拉取coredns
镜像失败所以我下面多了加个参数忽略镜像错误参数--ignore-preflight-errors="ImagePull"
这个问题后续自己处理。
1 kubeadm init --config initconfig.yaml --ignore-preflight-errors ="ImagePull"
记住init后打印的token,复制kubectl的kubeconfig,kubectl的kubeconfig路径默认是~/.kube/config
1 2 mkdir -p $HOME /.kubecp /etc/kubernetes/admin.conf $HOME /.kube/config
init的yaml信息实际上会存在集群的configmap里,我们可以随时查看,该yaml在其他node和master join的时候会使用到
1 2 kubectl get cm -n kube-system kubectl -n kube-system get cm kubeadm-config -o yaml
node节点加入 1 kubeadm join 172.16.50.200:6443 --token 4yvo6z.wv2u5tmehdhv4dc9 --discovery-token -ca -cert-hash sha256:b0c0724a1fbee5e53e3bd436902960b5aba17c298544155a0a10b219ef711266
安装flannel 1 kubectl apply -f https:// raw.githubusercontent.com/coreos/ flannel/master/ Documentation/kube-flannel.yml
查看集群节点 1 2 3 4 5 [root@k8s-master -01 ~] NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME k8s-master -01 Ready control-plane,master 16h v1.22.4 172.16 .50.200 <none> CentOS Linux 7 (Core) 5.15 .6 -1 .el7.elrepo.x86_64 containerd://1.4 .12 k8s-node -01 Ready <none> 24m v1.22.4 172.16 .50.203 <none> CentOS Linux 7 (Core) 5.15 .6 -1 .el7.elrepo.x86_64 containerd://1.4 .12 k8s-node -02 Ready <none> 23m v1.22.4 172.16 .50.204 <none> CentOS Linux 7 (Core) 5.15 .6 -1 .el7.elrepo.x86_64 containerd://1.4 .12
修改Kubelet参数 修改/var/lib/kubelet/kubeadm-flags.env
文件中参数,解决kubelet日志中的两个参数告警。
1 KUBELET_KUBEADM_ARGS ="--container-runtime=remote --container-runtime-endpoint=unix:///run/containerd/containerd.sock"
验证集群可用性 1 2 kubectl -n kube-system get pod -o wide
等待kube-system空间下的pod都是running后我们来测试下集群可用性
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 cat<<EOF | kubectl apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: nginx spec: selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - image: nginx:alpine name: nginx ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: nginx spec: selector: app: nginx ports: - protocol: TCP port: 80 targetPort: 80 --- apiVersion: v1 kind: Pod metadata: name: busybox namespace: default spec: containers: - name: busybox image: zhangguanzhang/centos command: - sleep - "3600" imagePullPolicy: IfNotPresent restartPolicy: Always EOF
验证集群dns 1 2 3 4 5 6 [root@k8s-master-01 ~]# kubectl exec -ti busybox -- nslookup kubernetes Server: 10.96 .0 .10 Address: 10.96 .0 .10 #53 Name: kubernetes.default.svc.cluster.localAddress: 10.96 .0 .1
在master上curl nginx的svc的ip出现nginx的index内容即集群正常,例如我的nginx svc ip是10.102.137.186
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 [root@k8s-master-01 ~]# curl 10.102.137.186<!DOCTYPE html > <html > <head > <title > Welcome to nginx!</title > <style > html { color -scheme: light dark; }body { width : 35em ; margin : 0 auto;font-family : Tahoma, Verdana, Arial, sans-serif; }</style > </head > <body > <h1 > Welcome to nginx!</h1 > <p > If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p > <p > For online documentation and support please refer to<a href ="http://nginx.org/" > nginx.org</a > .<br /> Commercial support is available at<a href ="http://nginx.com/" > nginx.com</a > .</p > <p > <em > Thank you for using nginx.</em > </p > </body > </html >
Kubernetes集群容器引擎切换 环境
OS: CentOS 7.6(当前最新版kernel)
Container runtime: Docker CE 20.10.11
Kubernetes: v1.22.4
1 2 3 4 5 [root@k8s-master -01 ~] NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME k8s-master -01 Ready control-plane,master 11m v1.22.4 172.16 .50.200 <none> CentOS Linux 7 (Core) 5.4 .163 -1 .el7.elrepo.x86_64 docker://20.10 .11 k8s-node -01 Ready <none> 10m v1.22.4 172.16 .50.203 <none> CentOS Linux 7 (Core) 5.4 .163 -1 .el7.elrepo.x86_64 docker://20.10 .11 k8s-node -02 Ready <none> 10m v1.22.4 172.16 .50.204 <none> CentOS Linux 7 (Core) 5.4 .163 -1 .el7.elrepo.x86_64 docker://20.10 .11
将该node标记为不可被调度,并且驱逐该node上的pod资源 1 2 3 4 [root@k8s-master -01 ~] NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES busybox 1 /1 Running 0 14m 10.244 .1.2 k8s-node -01 <none> <none> nginx-7 fb7fd49b4-pzqm6 1 /1 Running 0 14m 10.244 .2.2 k8s-node -02 <none> <none>
驱逐该node节点上的pod资源到集群中的其它节点上去
查看之前运行在该node上的pod被调度到了集群中的哪个节点
1 2 3 4 kubectl get pods -o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESbusybox 1 /1 Running 0 23 m 10.244.1.2 k8s-node-01 <none> <none>nginx -7 fb7fd49b4-bg5pn 1 /1 Running 0 3 m18s 10.244.1.3 k8s-node-01 <none> <none>
查看Kubernetes集群中的node资源信息
1 2 3 4 5 6 7 [root@k8s-master -01 ~] NAME STATUS ROLES AGE VERSION k8s-master -01 Ready control-plane,master 33m v1.22.4 k8s-node -01 Ready <none> 32m v1.22.4 k8s-node -02 Ready,SchedulingDisabled <none> 32m v1.22.4 如上信息,k8s-node -02 节点已经不可被调度了,接下来开始切换容器引擎
卸载原有docker
1 2 rpm -qa | grep dockerrpm -e docker-ce-20 .10 .11 -3 .el7.x86_64 docker-ce-cli-20 .10 .11 -3 .el7.x86_64 docker-ce-rootless-extras-20 .10 .11 -3 .el7.x86_64 docker-scan-plugin-0 .9 .0 -3 .el7.x86_64
安装containerd部分请参考本文章上面部分,这里我就不在写一遍了,安装方法都是一样的。 配置kubelet使用containerd
1 2 vim /var/ lib/kubelet/ kubeadm-flags.env KUBELET_KUBEADM_ARGS="--container-runtime=remote --container-runtime-endpoint=unix:///run/containerd/containerd.sock"
重启kubelet
1 2 system ctl daemon-reloadsystem ctl restart kubelet
验证容器引擎是否成功切换为containerd
1 2 3 4 5 [root@k8s-master -01 ~] NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME k8s-master -01 Ready control-plane,master 51m v1.22.4 172.16 .50.200 <none> CentOS Linux 7 (Core) 5.4 .163 -1 .el7.elrepo.x86_64 docker://20.10 .11 k8s-node -01 Ready <none> 51m v1.22.4 172.16 .50.203 <none> CentOS Linux 7 (Core) 5.4 .163 -1 .el7.elrepo.x86_64 docker://20.10 .11 k8s-node -02 Ready,SchedulingDisabled <none> 51m v1.22.4 172.16 .50.204 <none> CentOS Linux 7 (Core) 5.4 .163 -1 .el7.elrepo.x86_64 containerd://1.4 .12
通过将节点标记为可调度,让节点重新上线
1 2 kubectl uncordon <node-to -drain>