使用Kubeadm搭建一个高可用集群

市面上很多kubeadm的文章都是错误示范或者不够详细,大多数都没写系统设置之类的就直接kubeadm init导致很多跟着做的人会报错

我期望看到本文的读者最少具备以下知识:

  • Linux一些目录规范和systemd
  • 学过一点docker
  • 懂dns和/etc/hosts、curl互相结合来测试一些web的接口响应状态
  • 不要求github有自己项目,至少会浏览github

本教学将以下列节点数与规格来进行部署Kubernetes集群,系统CentOS 8.1,有条件8.2,不要使用centos7.4以及一下,容器技术依赖于内核技术,低版本系统部署和运行后问题会非常多

IP Hostname role CPU Memory
10.15.1.250 vip
10.15.1.17 K8S-M1 master 4 8G
10.15.1.18 K8S-M2 master 4 8G
10.15.1.19 K8S-M3 master 4 8G
10.15.1.20 K8S-N1 node 4 8G
10.15.1.21 K8S-N2 node 4 8G
  • 所有操作全部用root使用者进行,系统盘尽量大点,不然到时候镜像多了例如到了85%会被gc回收镜像
  • 高可用一般建议大于等于3台的奇数台,我使用3台master来做高可用
  • 一台也可以,但是差距不大,差异性我会在文章中注明的,并且单台master的话其他的master ip不用写即可

事前准备(每台机器)

系统层面设置

假设系统是刚用官方iso安装完成未作任何配置(网络和dns自行去配置)

  • 所有防火墙与SELinux 已关闭。如CentOS:
    否则后续 K8S 挂载目录时可能报错 Permission denied,有些云厂商的ip是被NetworkManager纳管的(例如青云),停了它会网络不通,可以不停。

    1
    2
    3
    4
    systemctl disable --now firewalld NetworkManager
    setenforce 0
    sed -ri '/^[^#]*SELINUX=/s#=.+$#=disabled#' /etc/selinux/config
    ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
  • 关闭 dnsmasq (可选)
    linux 系统开启了 dnsmasq 后(如 GUI 环境),将系统 DNS Server 设置为 127.0.0.1,这会导致 docker 容器无法解析域名,需要关闭它

    1
    systemctl disable --now dnsmasq
  • Kubernetes 建议关闭系统Swap,在所有机器使用以下指令关闭swap并注释掉/etc/fstab中swap的行,不想关闭可以不执行,后面会应对的配置选项:

    1
    2
    swapoff -a && sysctl -w vm.swappiness=0
    sed -ri '/^[^#]*swap/s@^@#@' /etc/fstab
  • 安装一些基础依赖和工具

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    yum install epel-release -y
    yum install -y \
    curl \
    wget \
    git \
    conntrack-tools \
    psmisc \
    nfs-utils \
    jq \
    socat \
    bash-completion \
    ipset \
    ipvsadm \
    conntrack \
    libseccomp \
    net-tools \
    crontabs \
    sysstat \
    unzip \
    bind-utils \
    tcpdump \
    telnet \
    lsof \
    htop
  • 如果集群kube-proxy想使用ipvs模式的话需要开机加载下列模块儿,按照规范使用systemd-modules-load来加载而不是在/etc/rc.local里写modprobe

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    :> /etc/modules-load.d/ipvs.conf
    module=(
    ip_vs
    ip_vs_rr
    ip_vs_wrr
    ip_vs_sh
    nf_conntrack
    br_netfilter
    )
    for kernel_module in ${module[@]};do
    /sbin/modinfo -F filename $kernel_module |& grep -qv ERROR && echo $kernel_module >> /etc/modules-load.d/ipvs.conf || :
    done
    systemctl restart systemd-modules-load.service

上面如果systemctl enable命令报错可以systemctl status -l systemd-modules-load.service看看哪个内核模块加载不了,在/etc/modules-load.d/ipvs.conf里注释掉它再enable试试

  • 所有机器需要设定/etc/sysctl.d/k8s.conf的系统参数,目前对ipv6支持不怎么好,所以里面也关闭ipv6了。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    cat <<EOF > /etc/sysctl.d/k8s.conf
    net.ipv6.conf.all.disable_ipv6 = 1
    net.ipv6.conf.default.disable_ipv6 = 1
    net.ipv6.conf.lo.disable_ipv6 = 1
    net.ipv4.neigh.default.gc_stale_time = 120
    net.ipv4.conf.all.rp_filter = 0
    net.ipv4.conf.default.rp_filter = 0
    net.ipv4.conf.default.arp_announce = 2
    net.ipv4.conf.lo.arp_announce = 2
    net.ipv4.conf.all.arp_announce = 2
    net.ipv4.ip_forward = 1
    net.ipv4.tcp_max_tw_buckets = 5000
    net.ipv4.tcp_syncookies = 1
    net.ipv4.tcp_max_syn_backlog = 1024
    net.ipv4.tcp_synack_retries = 2
    # 要求iptables不对bridge的数据进行处理
    net.bridge.bridge-nf-call-ip6tables = 1
    net.bridge.bridge-nf-call-iptables = 1
    net.bridge.bridge-nf-call-arptables = 1
    net.netfilter.nf_conntrack_max = 2310720
    fs.inotify.max_user_watches=89100
    fs.may_detach_mounts = 1
    fs.file-max = 52706963
    fs.nr_open = 52706963
    vm.overcommit_memory=1
    vm.panic_on_oom=0
    EOF

    sysctl --system
  • 如果选择关闭swap也要在内核里关闭,不关闭可以不执行

    1
    echo 'vm.swappiness = 0' >> /etc/sysctl.d/k8s.conf
  • 如果kube-proxy使用ipvs的话为了防止timeout需要设置下tcp参数

    1
    2
    3
    4
    5
    6
    7
    8
    9
    cat <<EOF >> /etc/sysctl.d/k8s.conf
    # https://github.com/moby/moby/issues/31208
    # ipvsadm -l --timout
    # 修复ipvs模式下长连接timeout问题 小于900即可
    net.ipv4.tcp_keepalive_time = 600
    net.ipv4.tcp_keepalive_intvl = 30
    net.ipv4.tcp_keepalive_probes = 10
    EOF
    sysctl --system
  • 修改systemctl启动的最小文件打开数量。关闭ssh方向dns解析

    1
    2
    3
    4
    sed -ri 's/^#(DefaultLimitCORE)=/\1=100000/' /etc/systemd/system.conf
    sed -ri 's/^#(DefaultLimitNOFILE)=/\1=100000/' /etc/systemd/system.conf

    sed -ri 's/^#(UseDNS )yes/\1no/' /etc/ssh/sshd_config
  • 文件最大打开数,按照规范,在子配置文件写

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    cat>/etc/security/limits.d/kubernetes.conf<<EOF
    * soft nproc 131072
    * hard nproc 131072
    * soft nofile 131072
    * hard nofile 131072
    root soft nproc 131072
    root hard nproc 131072
    root soft nofile 131072
    root hard nofile 131072
    EOF
  • 集群的HA依赖于时间一致性,安装并配置chrony

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    yum install -y chrony
    cat>/etc/chrony.conf<<EOF
    server cn.pool.ntp.org iburst minpoll 4 maxpoll 10
    server s1b.time.edu.cn iburst minpoll 4 maxpoll 10
    # Ignor source level
    stratumweight 0

    # Record the rate at which the system clock gains/losses time.
    driftfile /var/lib/chrony/chrony.drift

    # This directive enables kernel synchronisation (every 11 minutes) of the
    # real-time clock. Note that it can’t be used along with the 'rtcfile' directive.
    rtcsync

    # Allow the system clock to be stepped in the first three updates
    # if its offset is larger than 1 second.
    makestep 1.0 3


    # Enable hardware timestamping on all interfaces that support it.
    #hwtimestamp *

    # Increase the minimum number of selectable sources required to adjust
    # the system clock.
    #minsources 2

    bindcmdaddress 127.0.0.1

    #bindcmdaddress ::1

    # Specify file containing keys for NTP authentication.
    keyfile /etc/chrony/chrony.keys

    logdir /var/log/chrony
    # adjust time big than 1 sec will log to file
    logchange 1
    EOF

    systemctl enable --now chronyd
  • 修改hostname
    kubelet和kube-proxy上报node信息默认是取hostname的,除非通过--hostname-override指定,这里自行设置hostname

    1
    hostnamectl set-hostname xxx
  • 所有主机修改hosts文件

    1
    2
    3
    4
    5
    6
    7
    cat >>/etc/hosts << EOF
    10.15.1.17 k8s-m1
    10.15.1.18 k8s-m2
    10.15.1.19 k8s-m3
    10.15.1.20 k8s-n1
    10.15.1.21 k8s-n2
    EOF
  • docker官方的内核检查脚本建议(RHEL7/CentOS7: User namespaces disabled; add 'user_namespace.enable=1' to boot command line),使用下面命令开启

    1
    grubby --args="user_namespace.enable=1" --update-kernel="$(grubby --default-kernel)"
  • 重启系统

    1
    reboot

安装docker

这里我们使用docker官方的安装脚本安装docker(该脚本支持centos和ubuntu)但是如果是生产还是建议使用产考官方网站的安装方式。

1
2
export VERSION=19.03
curl -fsSL "https://get.docker.com/" | bash -s -- --mirror Aliyun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
mkdir -p /etc/docker/
cat>/etc/docker/daemon.json<<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"registry-mirrors": [
"https://fz5yth0r.mirror.aliyuncs.com",
"http://hub-mirror.c.163.com/",
"https://docker.mirrors.ustc.edu.cn/",
"https://registry.docker-cn.com"
],
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "3"
}
}
EOF

Live Restore Enabled这个千万别开,某些极端情况下容器Dead状态之类的必须重启docker daemon才能解决,开了就只能重启机器解决了

  • 设置docker开机启动,CentOS安装完成后docker需要手动设置docker命令补全:

    1
    2
    yum install -y epel-release bash-completion && \
    cp /usr/share/bash-completion/completions/docker /etc/bash_completion.d/
  • docker自1.13版起会自动设置iptables的FORWARD默认策略为DROP,这可能会影响Kubernetes集群依赖的报文转发功能,防止FORWARD的DROP策略影响转发,给docker daemon添加下列参数修正,当然暴力点也可以iptables -P FORWARD ACCEPT

    1
    2
    3
    4
    5
    6
    mkdir -p /etc/systemd/system/docker.service.d/
    cat>/etc/systemd/system/docker.service.d/10-docker.conf<<EOF
    [Service]
    ExecStartPost=/sbin/iptables -I FORWARD -s 0.0.0.0/0 -j ACCEPT
    ExecStopPost=/bin/bash -c '/sbin/iptables -D FORWARD -s 0.0.0.0/0 -j ACCEPT &> /dev/null || :'
    EOF
  • 启动docker并看下信息是否正常

    1
    2
    systemctl enable --now docker
    docker info
  • 如果enable docker的时候报错开启debug,如何开见

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    $ systemctl cat kubelet
    # /usr/lib/systemd/system/kubelet.service
    [Unit]
    Description=Kubernetes Kubelet
    Documentation=https://github.com/kubernetes/kubernetes
    After=docker.service
    Requires=docker.service

    [Service]
    ExecStart=/usr/local/bin/kubelet \
    --bootstrap-kubeconfig=/etc/kubernetes/bootstrap.kubeconfig \
    --kubeconfig=/etc/kubernetes/kubelet.kubeconfig \
    --config=/etc/kubernetes/kubelet-conf.yml \
    --hostname-override=k8s-m1 \
    --pod-infra-container-image=100.64.2.62:9999/pause-amd64:3.1 \
    --allow-privileged=true \
    --network-plugin=cni \
    --cni-conf-dir=/etc/cni/net.d \
    --cni-bin-dir=/opt/cni/bin \
    --cert-dir=/etc/kubernetes/pki \
    --logtostderr=false \
    --log-dir=/var/log/kubernetes/kubelet \
    --v=2

    Restart=always
    RestartSec=10s

    [Install]
    WantedBy=multi-user.target

    ExecStart的部分复制在终端运行,去掉--logtostderr--log-dir相关的不前台打印日志的选项,–v是日志等级,1-8

  • 另外kubelet启动报下面错的话,请开启ipv6

    1
    docker_service.go:401] Streaming server stopped unexpectedly: listen tcp [::1]:0: bind: cannot assign requested address

kubeadm部署

安装kubeadm相关

默认源在国外会无法安装,我们使用国内的镜像源,所有机器都要操作

1
2
3
4
5
6
7
cat <<EOF >/etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
EOF
master部分安装相关软件

k8s的node就是kubelet+cri(一般是docker),kubectl是一个agent读取kubeconfig去访问kube-apiserver来操作集群,kubeadm是部署,所以master节点需要安装三个,node一般不需要kubectl但是yum安装的时候还是会给你安装最新版所以我这里node还是安装了kubectl

1
2
3
4
5
6
yum install -y \
kubeadm-1.18.8 \
kubectl-1.18.8 \
kubelet-1.18.8 \
--disableexcludes=kubernetes && \
systemctl enable kubelet
node部分安装相关软件
1
2
3
4
5
6
yum install -y \
kubeadm-1.18.8 \
kubectl-1.18.8 \
kubelet-1.18.8 \
--disableexcludes=kubernetes && \
systemctl enable kubelet

配置kubelet的参数方法(有需要的话)

查看kubelet的systemd文件

1
systemctl cat kubelet

我们可以看到/etc/sysconfig/kubeletEnvironmentFile,里面注释也写明了我们应该在该文件里写KUBELET_EXTRA_ARGS来给kubelet配置运行参数,下面是个例子,具体参数啥的可以kubelet --help

1
2
3
cat >/etc/sysconfig/kubelet<<EOF
KUBELET_EXTRA_ARGS="--xxx=yyy --aaa=bbb"
EOF

文件/var/lib/kubelet/kubeadm-flags.env也一样

配置HA

  • 三台master都安装haproxy+keepalived

haproxy配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
global
maxconn 2000
ulimit-n 16384
log 127.0.0.1 local0 err
stats timeout 30s

defaults
log global
mode http
option httplog
timeout connect 5000
timeout client 50000
timeout server 50000
timeout http-request 15s
timeout http-keep-alive 15s

frontend monitor-in
bind *:33305
mode http
option httplog
monitor-uri /monitor

listen stats
bind *:8006
mode http
stats enable
stats hide-version
stats uri /stats
stats refresh 30s
stats realm Haproxy\ Statistics
stats auth admin:admin

frontend k8s-api
bind 0.0.0.0:8443
bind 127.0.0.1:8443
mode tcp
option tcplog
tcp-request inspect-delay 5s
default_backend k8s-api

backend k8s-api
mode tcp
option tcplog
option httpchk GET /healthz
http-check expect string ok
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
server api1 10.15.1.17:6443 check check-ssl verify none
server api2 10.15.1.18:6443 check check-ssl verify none
server api3 10.15.1.19:6443 check check-ssl verify none

keepalived配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
global_defs {
enable_script_security
}

vrrp_script haproxy-check {
user root
script "/bin/bash /etc/keepalived/check_haproxy.sh"
interval 3
weight -2
fall 10
rise 2
}

vrrp_instance haproxy-vip {
state BACKUP
priority 101
interface eth0
virtual_router_id 47
advert_int 3
unicast_src_ip 10.15.1.17 # 本机IP
unicast_peer {
10.15.1.18 # 对端IP
10.15.1.19 # 对端IP
}

virtual_ipaddress {
10.15.1.250/24 # VIP地址
}

track_script {
haproxy-check
}
}

keeaplived这里需要注意,默认keepalived是采用的组播方式,加上unicast_peer参数后是单播方式,三台keepalived配置文件不一样unicast_src_ipc参数写当前节点IP,unicast_peer参数写另外两个节点IP地址。其他的权重和BACKUP都保持一致

keepalived 健康检查脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
cat <<'EOF'> /etc/keepalived/check_haproxy.sh
#!/bin/bash
VIRTUAL_IP=10.15.1.250

errorExit() {
echo "*** $*" 1>&2
exit 1
}

if ip addr | grep -q $VIRTUAL_IP ; then
curl -s --max-time 2 --insecure https://${VIRTUAL_IP}:8443/healthz -o /dev/null || errorExit "Error GET https://${VIRTUAL_IP}:8443/healthz"
fi
EOF

外部Etcd服务搭建

openssl 证书配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
mkdir -p /etc/kubernetes/pki/etcd
cd /etc/kubernetes/pki

cat <<EOF> /etc/kubernetes/pki/openssl.cnf
[ req ]
default_bits = 2048
default_md = sha256
distinguished_name = req_distinguished_name

[req_distinguished_name]

[ v3_ca ]
basicConstraints = critical, CA:TRUE
keyUsage = critical, digitalSignature, keyEncipherment, keyCertSign

[ v3_req_server ]
basicConstraints = CA:FALSE
keyUsage = critical, digitalSignature, keyEncipherment
extendedKeyUsage = serverAuth

[ v3_req_client ]
basicConstraints = CA:FALSE
keyUsage = critical, digitalSignature, keyEncipherment
extendedKeyUsage = clientAuth

[ v3_req_apiserver ]
basicConstraints = CA:FALSE
keyUsage = critical, digitalSignature, keyEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names_cluster

[ v3_req_etcd ]
basicConstraints = CA:FALSE
keyUsage = critical, digitalSignature, keyEncipherment
extendedKeyUsage = serverAuth, clientAuth
subjectAltName = @alt_names_etcd

[ alt_names_cluster ]
DNS.1 = kubernetes
DNS.2 = kubernetes.default
DNS.3 = kubernetes.default.svc
DNS.4 = kubernetes.default.svc.cluster.local
DNS.5 = k8s-master1
DNS.6 = k8s-master2
DNS.7 = k8s-master3
DNS.8 = localhost
IP.1 = 10.96.0.1
IP.2 = 127.0.0.1
IP.3 = 10.0.7.100
IP.4 = 10.0.7.101
IP.5 = 10.0.7.102
IP.6 = 10.0.7.103

[ alt_names_etcd ]
DNS.1 = localhost
DNS.2 = k8s-m1
DNS.3 = k8s-m2
DNS.4 = k8s-m3
IP.1 = 10.15.1.17
IP.2 = 10.15.1.18
IP.3 = 10.15.1.19
IP.4 = 127.0.0.1
EOF

生成 CA 证书

etcd-ca

1
2
openssl genrsa -out etcd/ca.key 2048
openssl req -x509 -new -nodes -key etcd/ca.key -config openssl.cnf -subj "/CN=etcd-ca" -extensions v3_ca -out etcd/ca.crt -days 10000

生成证书

apiserver-etcd-client

1
2
3
openssl genrsa -out apiserver-etcd-client.key 2048
openssl req -new -key apiserver-etcd-client.key -subj "/CN=apiserver-etcd-client/O=system:masters" -out apiserver-etcd-client.csr
openssl x509 -in apiserver-etcd-client.csr -req -CA etcd/ca.crt -CAkey etcd/ca.key -CAcreateserial -extensions v3_req_etcd -extfile openssl.cnf -out apiserver-etcd-client.crt -days 10000

kube-etcd

1
2
3
openssl genrsa -out etcd/server.key 2048
openssl req -new -key etcd/server.key -subj "/CN=etcd-server" -out etcd/server.csr
openssl x509 -in etcd/server.csr -req -CA etcd/ca.crt -CAkey etcd/ca.key -CAcreateserial -extensions v3_req_etcd -extfile openssl.cnf -out etcd/server.crt -days 10000

kube-etcd-peer

1
2
3
openssl genrsa -out etcd/peer.key 2048
openssl req -new -key etcd/peer.key -subj "/CN=etcd-peer" -out etcd/peer.csr
openssl x509 -in etcd/peer.csr -req -CA etcd/ca.crt -CAkey etcd/ca.key -CAcreateserial -extensions v3_req_etcd -extfile openssl.cnf -out etcd/peer.crt -days 10000

kube-etcd-healthcheck-client

1
2
3
openssl genrsa -out etcd/healthcheck-client.key 2048
openssl req -new -key etcd/healthcheck-client.key -subj "/CN=etcd-client" -out etcd/healthcheck-client.csr
openssl x509 -in etcd/healthcheck-client.csr -req -CA etcd/ca.crt -CAkey etcd/ca.key -CAcreateserial -extensions v3_req_etcd -extfile openssl.cnf -out etcd/healthcheck-client.crt -days 10000

清理 csr srl

1
find . -name "*.csr" -o -name "*.srl"|xargs  rm -f

分发证书

将 Etcd 证书分发到其他 master 节点

1
2
scp -r /etc/kubernetes root@k8s-m2:/etc
scp -r /etc/kubernetes root@k8s-m3:/etc

配置 etcd

1
2
3
4
5
6
7
8
9
# https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.4.md#breaking-changes

mkdir -p /var/lib/etcd

chmod 0700 /var/lib/etcd # etcd 3.4.10开始目录权限必须是0700,见上面文档

wget https://mirrors.huaweicloud.com/etcd/v3.4.13/etcd-v3.4.13-linux-amd64.tar.gz

tar xf etcd-v3.4.13-linux-amd64.tar.gz --strip-components=1 -C /usr/bin/ etcd-v3.4.13-linux-amd64/{etcd,etcdctl}

设置 unit file 并启动 etcd,其他节点修改对应 ETCD_NAME 为 etcd1 和 etcd2,ip 改为节点 IP。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
ETCD_NAME=etcd0
ETCD_IP="10.15.1.17"
ETCD_IPS=(10.15.1.17 10.15.1.18 10.15.1.19)

cat<<EOF> /usr/lib/systemd/system/etcd.service
[Unit]
Description=etcd
Documentation=https://coreos.com/etcd/docs/latest/
After=network.target

[Service]
Type=notify
WorkingDirectory=/var/lib/etcd
ExecStart=/usr/bin/etcd \\
--name=${ETCD_NAME} \\
--data-dir=/var/lib/etcd \\
--listen-client-urls=https://127.0.0.1:2379,https://${ETCD_IP}:2379 \\
--advertise-client-urls=https://${ETCD_IP}:2379 \\
--listen-peer-urls=https://${ETCD_IP}:2380 \\
--initial-advertise-peer-urls=https://${ETCD_IP}:2380 \\
--cert-file=/etc/kubernetes/pki/etcd/server.crt \\
--key-file=/etc/kubernetes/pki/etcd/server.key \\
--client-cert-auth \\
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt \\
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt \\
--peer-key-file=/etc/kubernetes/pki/etcd/peer.key \\
--peer-client-cert-auth \\
--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt \\
--initial-cluster=etcd0=https://${ETCD_IPS[0]}:2380,etcd1=https://${ETCD_IPS[1]}:2380,etcd2=https://${ETCD_IPS[2]}:2380 \\
--initial-cluster-token=my-etcd-token \\
--initial-cluster-state=new \\
--heartbeat-interval 1000 \\
--election-timeout 5000

Restart=always
RestartSec=10s
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl restart etcd
systemctl enable etcd

配置Etcd环境变量

/etc/profile.d目录下创建etcd.sh文件将如下内容写入。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
cat >/etc/profile.d/etcd.sh<<'EOF'
ETCD_CERET_DIR=/etc/kubernetes/pki/etcd/
ETCD_CA_FILE=ca.crt
ETCD_KEY_FILE=healthcheck-client.key
ETCD_CERT_FILE=healthcheck-client.crt
ETCD_EP=https://10.15.1.17:2379,https://10.15.1.18:2379,https://10.15.1.19:2379

alias etcd_v2="etcdctl --cert-file ${ETCD_CERET_DIR}/${ETCD_CERT_FILE} \
--key-file ${ETCD_CERET_DIR}/${ETCD_KEY_FILE} \
--ca-file ${ETCD_CERET_DIR}/${ETCD_CA_FILE} \
--endpoints $ETCD_EP"

alias etcd_v3="ETCDCTL_API=3 \
etcdctl \
--cert ${ETCD_CERET_DIR}/${ETCD_CERT_FILE} \
--key ${ETCD_CERET_DIR}/${ETCD_KEY_FILE} \
--cacert ${ETCD_CERET_DIR}/${ETCD_CA_FILE} \
--endpoints $ETCD_EP"

function etcd-ha(){
etcd_v3 endpoint status --write-out=table
}

EOF

使用命令查看Etcd API版本

1
2
3
[root@k8s-m1 profile.d]# etcd_v3 version
etcdctl version: 3.4.13
API version: 3.4

查询Etcd状态

1
2
3
4
5
6
7
8
[root@k8s-m1 lib]# etcd_v3 --write-out=table endpoint status
+-------------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+-------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://10.15.1.17:2379 | aa86d9e8ed413b73 | 3.4.13 | 1.9 MB | false | 2 | 210645 |
| https://10.15.1.18:2379 | 5e699870e3d2ee5b | 3.4.13 | 1.9 MB | true | 2 | 210645 |
| https://10.15.1.19:2379 | d261643c99e6fbc7 | 3.4.13 | 1.9 MB | false | 2 | 210645 |
+-------------------------+------------------+---------+---------+-----------+-----------+------------+

配置集群信息(第一个master上配置)

打印默认init的配置信息

1
kubeadm config print init-defaults > initconfig.yaml

我们看下默认init的集群参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 1.2.3.4
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: k8s-m1
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.16.0
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
scheduler: {}

我们主要关注和只保留ClusterConfiguration的段,然后修改下,可以参考下列的v1beta2文档,如果是低版本可能是v1beta1,某些字段和新的是不一样的,自行查找godoc看
https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#hdr-Basics
https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2
https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#pkg-constants
https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#ClusterConfiguration

controlPlaneEndpoint写VIP地址,下面是最终的yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
imageRepository: registry.aliyuncs.com/k8sxio
kubernetesVersion: v1.18.8
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
podSubnet: 10.244.0.0/16
controlPlaneEndpoint: 10.15.1.250:8443 # 单个master的话写master的ip或者不写
apiServer: # https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#APIServer
timeoutForControlPlane: 4m0s
extraArgs:
authorization-mode: "Node,RBAC"
enable-admission-plugins: "NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeClaimResize,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota,Priority,PodPreset"
runtime-config: api/all=true,settings.k8s.io/v1alpha1=true
storage-backend: etcd3
certSANs:
- 127.0.0.1 # 多个master的时候负载均衡出问题了能够快速使用localhost调试
- localhost
- 10.15.1.18
- 10.15.1.19
- k8s-m2
- k8s-m3
extraVolumes:
- hostPath: /etc/localtime
mountPath: /etc/localtime
name: localtime
readOnly: true
controllerManager:
extraArgs:
bind-address: "0.0.0.0"
extraVolumes:
- hostPath: /etc/localtime
mountPath: /etc/localtime
name: localtime
readOnly: true
scheduler:
extraArgs:
bind-address: "0.0.0.0"
extraVolumes:
- hostPath: /etc/localtime
mountPath: /etc/localtime
name: localtime
readOnly: true
dns:
type: CoreDNS
etcd:
external:
endpoints:
- https://10.15.1.17:2379
- https://10.15.1.18:2379
- https://10.15.1.19:2379
caFile: /etc/kubernetes/pki/etcd/ca.crt
certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs # or iptables
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: "rr" # 调度算法
strictARP: false
syncPeriod: 15s
iptables:
masqueradeAll: true
masqueradeBit: 14
minSyncPeriod: 0s
syncPeriod: 30s
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
failSwapOn: true # 如果开启swap则设置为false
  • certSANs段只需要写另外两个master节点的IP主机名,因为kubeadm已经添加了一些,如果重复定义会导致证书中的IP和域名重复;使用 openssl x509 -in apiserver.crt -text -noout 可查看。此Bug是由于kubeadm没去重导致,该Bug将在1.18版本修复;

  • swap的话看最后一行,apiserver的exterArgs是为了开启podPreset,单台master的话把controlPlaneEndpoint的值改为第一个master的ip

  • kubectl get cs 查看组件状态发现controllerManagerscheduler 状态Unhealthy 删除--port=0 即可

  • 检查文件是否错误,忽略warning,错误的话会抛出error,没错则会输出到包含字符串kubeadm join xxx啥的

1
kubeadm init --config initconfig.yaml --dry-run

检查镜像是否正确

1
kubeadm config images list --config initconfig.yaml

预先拉取镜像

1
2
3
4
5
6
7
kubeadm config images pull --config initconfig.yaml # 下面是输出
[config/images] Pulled registry.aliyuncs.com/k8sxio/kube-apiserver:v1.18.8
[config/images] Pulled registry.aliyuncs.com/k8sxio/kube-controller-manager:v1.18.8
[config/images] Pulled registry.aliyuncs.com/k8sxio/kube-scheduler:v1.18.8
[config/images] Pulled registry.aliyuncs.com/k8sxio/kube-proxy:v1.18.8
[config/images] Pulled registry.aliyuncs.com/k8sxio/pause:3.2
[config/images] Pulled coredns/coredns:1.6.9

kubeadm init

下面init只在第一个master上面操作

1
2
3
# --upload-certs 参数的意思为将相关的证书直接上传到etcd中保存,这样省去我们手动分发证书的过程

kubeadm init --config initconfig.yaml --upload-certs

记住init后打印的token,复制kubectl的kubeconfig,kubectl的kubeconfig路径默认是~/.kube/config

1
2
mkdir -p $HOME/.kube
cp /etc/kubernetes/admin.conf $HOME/.kube/config

init的yaml信息实际上会存在集群的configmap里,我们可以随时查看,该yaml在其他node和master join的时候会使用到

1
kubectl -n kube-system get cm kubeadm-config -o yaml

配置其他master的k8s管理组件(某些低版本不支持上传证书的时候操作需手动拷贝证书)

第一个master上拷贝ca证书到其他master节点上

1
2
scp -r /etc/kubernetes/pki root@k8s-m2:/etc/kubernetes/
scp -r /etc/kubernetes/pki root@k8s-m3:/etc/kubernetes/

其他master join进来

1
2
3
kubeadm join 10.15.1.250:8443 \
--token xxx.zzzzzzzzz \
--discovery-token-ca-cert-hash sha256:xxxxxxxxxxx --control-plane

通过下列命令可以获取sha256的值

1
2
3
4
openssl x509 -pubkey -in \
/etc/kubernetes/pki/ca.crt | \
openssl rsa -pubin -outform der 2>/dev/null | \
openssl dgst -sha256 -hex | sed 's/^.* //'
  • 如果集群在init时使用了 --upload-certs 参数将相关的证书直接上传到etcd中保存,则其他master在加入时需要使用 --certificate-key 参数。
  • token忘记的话可以kubeadm token list查看,可以通过kubeadm token create创建。在高版本可以使用kubeadm token create --print-join-command创建,老版本不确定是否支持--print-join-command这个选项,不支持的话就不带--print-join-command选项创建token。
  • 通过将参数 --upload-certs 添加到 kubeadm init,你可以将控制平面证书临时上传到集群中的 Secret 请注意此 Secret 将在 2小时 后自动过期。证书使用 32 字节密钥加密,可以使用 --certificate-key 指定。

以下阶段命令可用于证书到期后重新上传证书:

1
kubeadm init phase upload-certs --upload-certs --certificate-key=SOME_VALUE

如果未将参数 --certificate-key 传递给 kubeadm initkubeadm init phase upload-certs, 则会自动生成一个新密钥。

以下命令可用于按需生成新密钥:

1
kubeadm alpha certs certificate-key

设置kubectl的补全脚本

1
kubectl completion bash > /etc/bash_completion.d/kubectl

获取节点状态信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@k8s-m1 kubernetes]# kubectl get nodes 
NAME STATUS ROLES AGE VERSION
k8s-m1 NotReady master 23h v1.15.6
k8s-m2 NotReady master 23h v1.15.6
k8s-m3 NotReady master 23h v1.15.6
k8s-n1 NotReady <none> 23h v1.15.6
k8s-n2 NotReady <none> 23h v1.15.6


[root@k8s-m1 kubernetes]# kubectl get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}

addon(此章开始到结尾选取任意一个master上执行)

容器的网络还没处理好,这里我用flannel部署,如果你了解bgp可以使用calico
yaml文件来源与flannel官方github https://github.com/coreos/flannel/tree/master/Documentation

1
2
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

验证集群可用性

1
kubectl -n kube-system get pod -o wide

等待kube-system空间下的pod都是running后我们来测试下集群可用性

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
cat<<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx:alpine
name: nginx
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: nginx
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: 80
---
apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: default
spec:
containers:
- name: busybox
image: zhangguanzhang/centos
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
EOF

等待pod running

1
2
3
4
5
6
7
8
$ kubectl get po,svc -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/busybox 1/1 Running 0 4m4s 10.244.2.18 k8s-n1 <none> <none>
pod/nginx-5c559d5697-2ctxh 1/1 Running 0 4m4s 10.244.2.16 k8s-n1 <none> <none>

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 12m <none>
service/nginx ClusterIP 10.100.39.101 <none> 80/TCP 4m4s app=nginx

验证集群dns

1
2
3
4
5
6
$ kubectl exec -ti busybox -- nslookup kubernetes
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name: kubernetes
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local

在master上curl nginx的svc的ip出现nginx的index内容即集群正常,例如我的nginx svc ip是10.100.39.101

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ curl -s 10.100.39.101
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

日志管理

以kube-apiserver为例将日志挂载出来方便管理(kube-controller-manager、kube-scheduler同理)

1
2
3
4
5
6
spec:
containers:
- command:
- --logtostderr=false
- --log-dir=/var/log/kubernetes/kube-apiserver
- --v=2
1
2
3
volumeMounts:
- mountPath: /var/log/kubernetes/kube-apiserver
name: k8s-logs
1
2
3
4
5
volumes:
- hostPath:
path: /var/log/kubernetes/kube-apiserver
type: DirectoryOrCreate
name: k8s-logs

kubelet日志(因为kubelet是用systemctl管理的非容器管理所以日志不存在挂载一说)

1
[root@k8s-m1 manifests]# vim /etc/sysconfig/kubelet 
1
--v=2 --logtostderr=false --log-dir=/var/log/kubernetes/kubelet

使用Kubeadm搭建一个高可用集群
https://system51.github.io/2019/12/03/kubeadm-base-use/
作者
Mr.Ye
发布于
2019年12月3日
许可协议