Kubernetes之调度器和调度过程

Scheduler

  当用户请求向API server创建新的Pod时,API server检查授权、权限等没有任何问题的话,他会把这个请求交由Scheduler,由Scheduler它会检查所有符合该Pod要求的Nodelist,为Pod安排一个落脚点(Node),选定Node完成后会把选择结果返回给API server,并将选定信息写入etcd中。接下来API server将指挥被选定Node的kubelet去创建Pod(或者说kubelet始终去watch API server当中与当前节点相关联的事件变动),因此接下来这个node就要去尝试着获取到API server当中定义的这个pod的配置清单,根据配置清单当中的定义去创建pod。具体来说Scheduler的作用是将待调度的Pod按照特定的调度算法和调度策略绑定到集群中的某个合适的Node上,整个调度过程中涉及三个对象,分别是:待调度的Pod列表,可以的Node列表,以及调度算法和策略。

Kubernetes Scheduler 提供的调度流程分三步:

  • 预选策略(predicate) 遍历nodelist,选择出符合要求的候选节点,Kubernetes内置了多种预选规则供用户选择。
  • 优选策略(priority) 在选择出符合要求的候选节点中,采用优选规则计算出每个节点的积分,最后选择得分最高的。
  • 选定(select) 如果最高得分有好几个节点,select就会从中随机选择一个节点。

如图:
scheduler-1

高级调度方式

当我们想把Pod调度到预期的节点上,我们可以使用如下高级调度方式进行Pod调度:

  • 节点选择器: nodeSelector(Pod仅运行在能匹配到节点标签的主机上)、nodeName(Pod仅运行特定node)
  • 节点亲和性调度: nodeAffinity
  • Pod亲和性调度:PodAffinity
  • Pod反亲和性调度:podAntiAffinity

调度器分类:

  • nodeAffinity:主要解决POD要部署在哪些主机,以及POD不能部署在哪些主机上的问题,处理的是POD和主机之间的关系。
  • podAffinity:主要解决POD可以和哪些POD部署在同一个拓扑域中的问题(拓扑域用主机标签实现,可以是单个主机,也可以是多个主机组成的cluster、zone等。),podAntiAffinity主要解决POD不能和哪些POD部署在同一个拓扑域中的问题。podAntiAffinity和podAffinity它们都是处理Kubernetes集群内部POD和POD之间的关系。比如一个 pod 在一个节点上了,那么我这个也得在这个节点,或者你这个 pod 在节点上了,那么我就不想和你待在同一个节点上。

三种亲和性和反亲和性策略的比较如下表所示:

策略名称 匹配目标 支持的操作符 支持拓扑域 设计目标
nodeAffinity 主机标签 In,NotIn,Exists,DoesNotExist,Gt,Lt 不支持 决定Pod可以部署在哪些主机上
podAffinity Pod标签 In,NotIn,Exists,DoesNotExist 支持 决定Pod可以和哪些Pod部署在同一拓扑域
PodAntiAffinity Pod标签 In,NotIn,Exists,DoesNotExist 支持 决定Pod不可以和哪些Pod部署在同一拓扑域
  • In:label 的值在某个列表中
  • NotIn:label 的值不在某个列表中
  • Gt:label 的值大于某个值
  • Lt:label 的值小于某个值
  • Exists:某个 label 存在(Values必须为空)
  • DoesNotExist:某个 label 不存在(Values必须为空)

使用场景

nodeAffinity使用场景:

  • 将S1服务的所有Pod部署到指定的符合标签规则的主机上。
  • 将S1服务的所有Pod部署到除部分主机外的其他主机上。

podAffinity使用场景:

  • 将某一特定服务的pod部署在同一拓扑域中,不用指定具体的拓扑域。
  • 如果S1服务使用S2服务,为了减少它们之间的网络延迟(或其它原因),把S1服务的POD和S2服务的pod部署在同一拓扑域中。

podAntiAffinity使用场景:

  • 将一个服务的POD分散在不同的主机或者拓扑域中,提高服务本身的稳定性。
  • 给POD对于一个节点的独占访问权限来保证资源隔离,保证不会有其它pod来分享节点资源。
  • 把可能会相互影响的服务的POD分散在不同的主机上。

NodeSelector

现在我们给node1打上一个标签node=ssd

1
2
3
4
5
6
7
8
[root@master-1 app]# kubectl label nodes node-1 node=ssd
node/node-1 labeled

[root@master-1 app]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
master-1 Ready master 43d v1.13.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=master-1,node-role.kubernetes.io/master=
node-0 Ready <none> 43d v1.13.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node-0
node-1 Ready <none> 43d v1.13.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node-1,node=ssd

我们定义一个pod,让其选择带有node=ssd这个标签的节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@master-1 app]# vim node-selector-demo.yaml
apiVersion: v1
kind: Pod
metadata:
name: myapp
namespace: default
labels:
app: myapp
spec:
containers:
- name: myapp
image: nginx:latest
imagePullPolicy: IfNotPresent
nodeSelector:
node: ssd
1
[root@master-1 app]# kubectl apply -f node-selector-demo.yaml
1
2
3
4
5
[root@master-1 app]# kubectl get pod  -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
...
myapp 1/1 Running 0 14s 10.244.2.7 node-1 <none> <none>
...

&emsp;&emsp;从上面的效果可以看出,设置了nodeSelector以后只会将Pod调度到符合标签的node1上,但是需要注意的是如果没有一个node满足nodeSelector的标签那么Pod会一直处于Pending状态直到有Node满足条件。

nodeAffinity

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
[root@master-1 app]# kubectl explain pod.spec.affinity.nodeAffinity 
KIND: Pod
VERSION: v1

RESOURCE: nodeAffinity <Object>

DESCRIPTION:
Describes node affinity scheduling rules for the pod.

Node affinity is a group of node affinity scheduling rules.

FIELDS:
preferredDuringSchedulingIgnoredDuringExecution <[]Object>
The scheduler will prefer to schedule pods to nodes that satisfy the
affinity expressions specified by this field, but it may choose a node that
violates one or more of the expressions. The node that is most preferred is
the one with the greatest sum of weights, i.e. for each node that meets all
of the scheduling requirements (resource request, requiredDuringScheduling
affinity expressions, etc.), compute a sum by iterating through the
elements of this field and adding "weight" to the sum if the node matches
the corresponding matchExpressions; the node(s) with the highest sum are
the most preferred.

requiredDuringSchedulingIgnoredDuringExecution <Object>
If the affinity requirements specified by this field are not met at
scheduling time, the pod will not be scheduled onto the node. If the
affinity requirements specified by this field cease to be met at some point
during pod execution (e.g. due to an update), the system may or may not try
to eventually evict the pod from its node.

  • requiredDuringSchedulingIgnoredDuringExecution 硬亲和性 必须满足亲和性。
    • matchExpressions 匹配表达式,这个标签可以指定一段,例如pod中定义的key为zone,operator为In(包含那些),values为 foo和bar。表示就是在node节点中包含foo和bar的标签中调度
    • matchFields 匹配字段 和上面定义方式一样,不过他定义的不是标签值而是定义字段。
  • preferredDuringSchedulingIgnoredDuringExecution 软亲和性 能满足最好,不满足也没关系。
    • preference 配置节点选择器,与相应的权重相关联。
    • weight 权重1-100范围内,因为它是软性条件,所以并非一定要全匹配。在preference中匹配到的条目越多越符合条件。最后通过计算权重决定那个节点更符合条件。

硬亲和性:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
[root@master-1 app]# vim test.yaml
apiVersion: v1
kind: Pod
metadata:
name: myapp
namespace: default
labels:
app: myapp
spec:
containers:
- name: myapp
image: nginx:latest
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node
operator: In
values: ["foo","bar"]


[root@master-1 app]# kubectl apply -f test.yaml
pod/myapp created

[root@master-1 app]# kubectl get pod
NAME READY STATUS RESTARTS AGE
...
myapp 1/1 Running 0 2m17s
...

软亲和性:

requiredDuringSchedulingIgnoredDuringExecution比较,这里需要注意的是preferredDuringSchedulingIgnoredDuringExecution是个列表对象,而preference就是一个对象。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[root@master-1 app]# vim test1.yaml
apiVersion: v1
kind: Pod
metadata:
name: myapp
namespace: default
labels:
app: myapp
spec:
containers:
- name: myapp
image: nginx:latest
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 50
preference:
matchExpressions:
- key: zone
operator: In
values: ["foo","bar"]
1
2
[root@master-1 app]# kubectl apply -f test1.yaml 
pod/myapp created
1
2
3
4
5
6
[root@master-1 app]# kubectl get pod
NAME READY STATUS RESTARTS AGE
...
myapp 1/1 Running 0 2m17s
...

podAffinity

&emsp;&emsp;Pod亲和性场景,我们的k8s集群的节点分布在不同的区域或者不同的机房,当服务A和服务B要求部署在同一个区域或者同一机房的时候,我们就需要亲和性调度了。

kubectl explain pod.spec.affinity.podAffinity和NodeAffinity是一样的,都是有硬亲和性和软亲和性

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
[root@master-1 app]# kubectl explain pod.spec.affinity.podAffinity
KIND: Pod
VERSION: v1

RESOURCE: podAffinity <Object>

DESCRIPTION:
Describes pod affinity scheduling rules (e.g. co-locate this pod in the
same node, zone, etc. as some other pod(s)).

Pod affinity is a group of inter pod affinity scheduling rules.

FIELDS:
preferredDuringSchedulingIgnoredDuringExecution <[]Object>
The scheduler will prefer to schedule pods to nodes that satisfy the
affinity expressions specified by this field, but it may choose a node that
violates one or more of the expressions. The node that is most preferred is
the one with the greatest sum of weights, i.e. for each node that meets all
of the scheduling requirements (resource request, requiredDuringScheduling
affinity expressions, etc.), compute a sum by iterating through the
elements of this field and adding "weight" to the sum if the node has pods
which matches the corresponding podAffinityTerm; the node(s) with the
highest sum are the most preferred.

requiredDuringSchedulingIgnoredDuringExecution <[]Object>
If the affinity requirements specified by this field are not met at
scheduling time, the pod will not be scheduled onto the node. If the
affinity requirements specified by this field cease to be met at some point
during pod execution (e.g. due to a pod label update), the system may or
may not try to eventually evict the pod from its node. When there are
multiple elements, the lists of nodes corresponding to each podAffinityTerm
are intersected, i.e. all terms must be satisfied.

通用参数

  • labelSelector 选择跟那组Pod亲和
  • namespaces 选择哪个命名空间
  • topologyKey 用来缩小节点选择范围,其值可以是任何合法的节点标签(key),在大规模集群中,为此字段不指定或者指定错误值,可能引发巨大的性能、安全问题。因此,对其使用有如下限制:
    • 对于Pod亲和与Pod硬性反亲和,topologyKey字段值不能为空。
    • 对于硬性反亲和,topoloygKey只能是kubernetes.io/hostname,除非禁止LimitPodHardAntiAffinityTopology允入控制器或者修改其实现。
    • 对于Pod软反亲和,允许topoloygKey为空,表示对节点拓扑没有限制。
    • 以上情况外,topologyKey可以是任何合法标签(key)。

硬亲和性:

&emsp;&emsp;由于我们这里只有一个集群,并没有区域或者机房的概念,所以我们这里直接使用主机名来作为拓扑域(topologyKey),把 pod 创建在同一个主机上面。

1
2
3
4
5
[root@master-1 app]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
master-1 Ready master 43d v1.13.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=master-1,node-role.kubernetes.io/master=
node-0 Ready <none> 43d v1.13.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node-0
node-1 Ready <none> 43d v1.13.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node-1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
[root@master-1 app]# vim pod-affinity-demo.yaml
apiVersion: v1
kind: Pod
metadata:
name: node-affinity-pod1
labels:
name: podaffinity-myapp
tier: service
spec:
containers:
- name: myapp
image: nginx:latest
imagePullPolicy: IfNotPresent
---
apiVersion: v1
kind: Pod
metadata:
name: node-affinity-pod2
labels:
tier: front
spec:
containers:
- name: myapp
image: nginx:latest
imagePullPolicy: IfNotPresent
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: name
operator: In
values:
- podaffinity-myapp
topologyKey: kubernetes.io/hostname

[root@master-1 app]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
...
node-affinity-pod1 1/1 Running 0 7m7s 10.244.2.9 node-1 <none> <none>
node-affinity-pod2 1/1 Running 0 7m7s 10.244.2.10 node-1 <none> <none>
...

&emsp;&emsp;上面这个例子中的Pod node-affinity-pod2 需要调度到某个指定的主机上,至少有一个节点上运行了这样的 pod:这个 pod 有一个name=podaffinity-myapp的 label

如果我们把上面的 node-affinity-pod1 删除,然后重新创建 affinity 这个资源,看看能不能正常调度呢:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
[root@master-1 pki]# kubectl delete  pod node-affinity-pod1
pod "node-affinity-pod1" deleted

[root@master-1 app]# vim pod2-affinity-demo.yaml
apiVersion: v1
kind: Pod
metadata:
name: node-affinity-pod2
labels:
tier: front
spec:
containers:
- name: myapp
image: nginx:latest
imagePullPolicy: IfNotPresent
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: name
operator: In
values:
- podaffinity-myapp
topologyKey: kubernetes.io/hostname

[root@master-1 app]# kubectl apply -f pod2-affinity-demo.yaml
pod/node-affinity-pod2 created

[root@master-1 app]# kubectl get pod
NAME READY STATUS RESTARTS AGE
...
node-affinity-pod2 0/1 Pending 0 5m57s
...

&emsp;&emsp;我们可以看到处于Pending状态了,这是因为现在没有一个节点上面拥有name=podaffinity-myapp这个 label 的 pod,而上面我们的调度使用的是硬策略,所以就没办法进行调度了,大家可以去尝试下重新将 node-affinity-pod1 这个 pod 调度到 node-0 这个节点上,看看上面的 affinity 会不会也被调度到 node-0 这个节点上去?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[root@master-1 app]# vim pod1-affinity-demo.yaml
apiVersion: v1
kind: Pod
metadata:
name: node-affinity-pod1
labels:
name: podaffinity-myapp
tier: service
spec:
containers:
- name: myapp
image: nginx:latest
imagePullPolicy: IfNotPresent
nodeSelector:
kubernetes.io/hostname: node-0

[root@master-1 app]# kubectl apply -f pod1-affinity-demo.yaml
pod/node-affinity-pod1 created

[root@master-1 app]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
...
node-affinity-pod1 1/1 Running 0 13s 10.244.4.9 node-0 <none> <none>
node-affinity-pod2 1/1 Running 0 11m 10.244.4.10 node-0 <none> <none>
...

&emsp;&emsp;我们这个地方使用的是kubernetes.io/hostname这个拓扑域,意思就是我们当前调度的 pod 要和目标的 pod 处于同一个主机上面,因为要处于同一个拓扑域下面kubernetes.io/hostname=node-0,为了说明这个问题,我们把拓扑域改成beta.kubernetes.io/os,同样的我们当前调度的 pod 要和目标的 pod 处于同一个拓扑域中,目标的 pod 是不是拥有beta.kubernetes.io/os=linux的标签,而我们这里3个节点都有这样的标签,这也就意味着我们3个节点都在同一个拓扑域中,所以我们这里的 pod 可能会被调度到任何一个节点(因为master节点设置了污点所以不会调度至master节点),判断他们是否在同一拓扑域中是根据topologyKey中指定的node标签的values是否相同,如果相同则表示在同一拓扑域中:

1
2
3
4
5
[root@master-1 app]# kubectl get node --show-labels
NAME STATUS ROLES AGE VERSION LABELS
master-1 Ready master 44d v1.13.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=master-1,node-role.kubernetes.io/master=
node-0 Ready <none> 44d v1.13.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node-0
node-1 Ready <none> 44d v1.13.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node-1
1
2
3
4
5
6
7
8
[root@master-1 app]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-ye-5664f956f8-4z26d 1/1 Running 0 8s 10.244.4.12 node-0 <none> <none>
nginx-ye-5664f956f8-htb86 1/1 Running 0 8s 10.244.4.13 node-0 <none> <none>
nginx-ye-5664f956f8-pd445 1/1 Running 0 8s 10.244.2.20 node-1 <none> <none>
nginx-ye-5664f956f8-sqsws 1/1 Running 0 8s 10.244.4.11 node-0 <none> <none>
nginx-ye-5664f956f8-wns5w 1/1 Running 0 8s 10.244.2.21 node-1 <none> <none>
node-affinity-pod1 1/1 Running 0 61m 10.244.2.13 node-1 <none> <none>

podAntiAffinity

&emsp;&emsp;和Pod亲和性的用法一致,只是Pod反亲和性则是反着来的。比如一个节点上运行了某个应用服务pod,那么我们的 数据库服务pod则尽量不要在同一台节点上,这就是反亲和性。

kubectl explain pod.spec.affinity.podAntiAffinity也是一样的,都是有硬亲和性和软亲和性;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
[root@master-1 app]# kubectl explain pod.spec.affinity.podAntiAffinity

KIND: Pod
VERSION: v1

RESOURCE: podAntiAffinity <Object>

DESCRIPTION:
Describes pod anti-affinity scheduling rules (e.g. avoid putting this pod
in the same node, zone, etc. as some other pod(s)).

Pod anti affinity is a group of inter pod anti affinity scheduling rules.

FIELDS:
preferredDuringSchedulingIgnoredDuringExecution <[]Object>
The scheduler will prefer to schedule pods to nodes that satisfy the
anti-affinity expressions specified by this field, but it may choose a node
that violates one or more of the expressions. The node that is most
preferred is the one with the greatest sum of weights, i.e. for each node
that meets all of the scheduling requirements (resource request,
requiredDuringScheduling anti-affinity expressions, etc.), compute a sum by
iterating through the elements of this field and adding "weight" to the sum
if the node has pods which matches the corresponding podAffinityTerm; the
node(s) with the highest sum are the most preferred.

requiredDuringSchedulingIgnoredDuringExecution <[]Object>
If the anti-affinity requirements specified by this field are not met at
scheduling time, the pod will not be scheduled onto the node. If the
anti-affinity requirements specified by this field cease to be met at some
point during pod execution (e.g. due to a pod label update), the system may
or may not try to eventually evict the pod from its node. When there are
multiple elements, the lists of nodes corresponding to each podAffinityTerm
are intersected, i.e. all terms must be satisfied.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@master-1 app]# vim test2.yaml
apiVersion: v1
kind: Pod
metadata:
name: node-affinity-pod1
labels:
name: podaffinity-myapp
tier: service
spec:
containers:
- name: myapp
image: nginx:latest
imagePullPolicy: IfNotPresent
nodeSelector:
kubernetes.io/hostname: node-1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#反硬亲和调度
[root@master-1 app]# pod-antiaffinity-demo.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-ye
labels:
app: myapp
spec:
replicas: 5
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
version: latest
spec:
containers:
- name: myapp
image: nginx:latest
imagePullPolicy: IfNotPresent
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: name
operator: In
values: ["podaffinity-myapp"]
topologyKey: kubernetes.io/hostname

&emsp;&emsp;这里的意思就是如果一个节点上面有一个name=podaffinity-myapp这样的 pod 的话,那么我们的 pod 就别调度到这个节点上面来,上面我们把name=podaffinity-myapp这个 pod 固定到了 node-1这个节点上面来,所以正常来说我们这里的 pod 不会出现在 node-1 节点上:

1
2
3
4
5
6
7
8
[root@master-1 app]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-ye-79c4778f6c-4h5ft 1/1 Running 0 15m 10.244.4.20 node-0 <none> <none>
nginx-ye-79c4778f6c-8tw9k 1/1 Running 0 15m 10.244.4.16 node-0 <none> <none>
nginx-ye-79c4778f6c-gpgxb 1/1 Running 0 15m 10.244.4.17 node-0 <none> <none>
nginx-ye-79c4778f6c-hbrfr 1/1 Running 0 15m 10.244.4.19 node-0 <none> <none>
nginx-ye-79c4778f6c-rbw5r 1/1 Running 0 15m 10.244.4.18 node-0 <none> <none>
node-affinity-pod1 1/1 Running 0 147m 10.244.2.13 node-1 <none> <none>

更实际的用例

&emsp;&emsp;在三个节点集群中,web应用程序具有内存缓存,比如redis。我们希望web服务器尽可能与redis共存。

&emsp;&emsp;下面的yaml文件是部署一个简单的redis,其中包含三个副本和label标签app=store。还配置了PodAntiAffinity,以确保调度器不会在单个节点上同时调度多个副本。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-cache
spec:
selector:
matchLabels:
app: store
replicas: 3
template:
metadata:
labels:
app: store
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- store
topologyKey: "kubernetes.io/hostname"
containers:
- name: redis-server
image: redis:3.2-alpine

&emsp;&emsp;下面是web服务器部署的yaml文件配置了podAntiAffinitypodAffinity。这将通知调度器,它的所有副本都将与具有选择器标签app=store的pod共存。这还将确保每个web服务器副本不会同时位于单个节点上。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-server
spec:
selector:
matchLabels:
app: web-store
replicas: 3
template:
metadata:
labels:
app: web-store
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web-store
topologyKey: "kubernetes.io/hostname"
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- store
topologyKey: "kubernetes.io/hostname"
containers:
- name: web-app
image: nginx:1.12-alpine

如果我们创建上面的两个部署,我们的三个节点集群应该如下所示。

node-1 node-2 node-3
webserver-1 webserver-2 webserver-3
cache-1 cache-2 webserver-3

正如你所看到的, web-server的所有3个副本都按照预期自动与缓存共存。

1
2
3
4
5
6
7
8
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
redis-cache-1450370735-6dzlj 1/1 Running 0 8m 10.192.4.2 kube-node-3
redis-cache-1450370735-j2j96 1/1 Running 0 8m 10.192.2.2 kube-node-1
redis-cache-1450370735-z73mh 1/1 Running 0 8m 10.192.3.1 kube-node-2
web-server-1287567482-5d4dz 1/1 Running 0 7m 10.192.2.3 kube-node-1
web-server-1287567482-6f7v5 1/1 Running 0 7m 10.192.4.3 kube-node-3
web-server-1287567482-s330j 1/1 Running 0 7m 10.192.3.2 kube-node-2

上面的例子使用PodAntiAffinity规则和topologyKey: "kubernetes.io/hostname"。来部署redis集群,使其同一主机上不存在两个实例;


Kubernetes之调度器和调度过程
https://system51.github.io/2019/08/26/Kubernetes-Scheduler/
作者
Mr.Ye
发布于
2019年8月26日
许可协议