前言:
注 意 \color{red}注意注意:集群版本不同,部署方式不同,本文章配置模式为修改初始启动文件,需要重启docker,kubelet服务,慎行!1.21版本过后,支持动态配置node节点调度规则,以yaml形式管理
官网链接
1.限制k8s-node计算资源(修改启动文件方式-适用版本k8s-1.7+):链接地址
2.节点调度驱逐策略(动态配置节点资源规则-使用版本k8s-1.21+):链接地址
借鉴链接:k8s 节点可分配资源限制
kubectl api-versions
用来确认当前版本是否支持动态配置节点调度
查看当前集群apiversion命令可使用的资源类型
前置要求
必须调整为cgroup的管理方式
1.先确认docker的cgroup driver:
# docker info | grep "Cgroup Driver" Cgroup Driver: cgroupfs
如果确认docker的Cgroup Driver不是 cgroupfs,则可以通过以下方法配置。
2.修改docker配置
{ "registry-mirrors": ["https://bk6kzfqm.mirror.aliyuncs.com"], "exec-opts": ["native.cgroupdriver=cgroupfs"], #修改此处 "log-driver": "json-file", "log-opts": { "max-size": "100m" }, "storage-driver": "overlay2", "storage-opts": [ "overlay2.override_kernel_check=true" ] }
3.修改kubelet cgroup 驱动systemd为cgroupfs
# vim /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.2"
--cgroup-driver=cgroupfs参数修改成cgroupfs
4.查看kubelet 所有的配置文件
# /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf # Note: This dropin only works with kubeadm and kubelet v1.11+ [Service] Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf" Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml" # This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env # This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use # the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file. EnvironmentFile=-/etc/sysconfig/kubelet ExecStart= ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
vim /etc/sysconfig/kubelet KUBELET_EXTRA_ARGS=--cgroup-driver=cgroupfs ##修改成cgroupfs
5.重启docker和kubelet
systemctl restart docker && systemctl restart kubelet ##报错的话排查问题 # systemctl status kubelet.service -l # journalctl _PID=<pid>
Kubelet Node Allocatable 节点约束资源
1.查看当前节点可用资源
kubectl describe nodes <node_name> ... Capacity: ##总资源 cpu: 2 ephemeral-storage: 99561988Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 4026372Ki pods: 110 Allocatable: ##可用资源 cpu: 2 ephemeral-storage: 91756327989 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 2868278679 #大概2.6g pods: 110 ...
2.概念学习
Kubelet Node Allocatable用来为Kube组件和System进程预留资源,从而保证当节点出现满负荷时也能保证Kube和System进程有足够的资源。
目前支持cpu, memory, ephemeral-storage三种资源预留。
Node Capacity是Node的所有硬件资源,kube-reserved是给kube组件预留的资源,system-reserved是给System进程预留的资源, eviction-threshold(阈值)是kubelet eviction(收回)的阈值设定,
allocatable才是真正scheduler调度Pod时的参考值(保证Node上所有Pods的request resource不超过Allocatable)
Node Allocatable Resource = Node Capacity - Kube-reserved - system-reserved - eviction-threshold。
修改后/var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.2 \ --enforce-node-allocatable=pods,kube-reserved,system-reserved \ --kube-reserved-cgroup=/system.slice/kubelet.service \ --system-reserved-cgroup=/system.slice \ --kube-reserved=cpu=0,memory=1000Mi \ --system-reserved=cpu=1,memory=1000Mi \ --eviction-hard=memory.available<5%,nodefs.available<10%,imagefs.available<10% \ --eviction-soft=memory.available<10%,nodefs.available<15%,imagefs.available<15% \ --eviction-soft-grace-period=memory.available=2m,nodefs.available=2m,imagefs.available=2m \ --eviction-max-pod-grace-period=30 \ --eviction-minimum-reclaim=memory.available=0Mi,nodefs.available=500Mi,imagefs.available=500Mi"
参数解释:
--enforce-node-allocatable=pods,kube-reserved,system-reserved
含义:指定kubelet为哪些进程做硬限制,可选的值有:
pods
kube-reserved #给kube组件预留的资源:kubelet,kube-proxy以及docker等
system-reserve #system-reserved:给system进程预留的资源
--kube-reserved-cgroup=/system.slice/kubelet.service
含义:这个参数用来指定k8s系统组件所使用的cgroup。
注意,这里指定的cgroup及其子系统需要预先创建好,kubelet并不会为你自动创建好。
--system-reserved-cgroup=/system.slice
含义:这个参数用来指定系统守护进程所使用的cgroup。
注意,这里指定的cgroup及其子系统需要预先创建好,kubelet并不会为你自动创建好。
--kube-reserved=cpu=1,memory=250Mi
含义:这里的kube-reserved只为非pod形式启动的kube组件预留资源
--system-reserved=cpu=200m,memory=250Mi
含义:为系统守护进程(sshd, udev等)预留的资源量,
如:–system-reserved=cpu=500m,memory=1Gi,ephemeral-storage=1Gi。
注意,除了考虑为系统进程预留的量之外,还应该为kernel和用户登录会话预留一些内存。
--eviction-hard=memory.available<5%,nodefs.available<10%,imagefs.available<10%
含义:设置进行pod驱逐的阈值,这个参数只支持内存和磁盘。
通过–eviction-hard标志预留一些内存后,当节点上的可用内存降至保留值以下时,
kubelet 将会对pod进行驱逐。
--eviction-soft=memory.available<10%,nodefs.available<15%,imagefs.available<15%
含义:配置 驱逐pod的软阈值
--eviction-soft-grace-period=memory.available=2m,nodefs.available=2m,imagefs.available=2m
含义:定义达到软阈值之后,持续时间超过多久才进行驱逐
--eviction-max-pod-grace-period=30
含义:驱逐pod前最大等待时间=min(pod.Spec.TerminationGracePeriodSeconds, eviction-max-pod-grace-period),单位为秒
--eviction-minimum-reclaim=memory.available=0Mi,nodefs.available=500Mi,imagefs.available=500Mi
含义:至少回收的资源量
3.开始修改并生效
修改成合适的值后,保存
# cat /var/lib/kubelet/kubeadm-flags.env KUBELET_KUBEADM_ARGS="--cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.2 \ --enforce-node-allocatable=pods,kube-reserved,system-reserved \ --kube-reserved-cgroup=/system.slice/kubelet.service \ --system-reserved-cgroup=/system.slice \ --kube-reserved=cpu=0,memory=100Mi \ --system-reserved=cpu=1,memory=100Mi \ --eviction-hard=memory.available<5%,nodefs.available<10%,imagefs.available<10% \ --eviction-soft=memory.available<10%,nodefs.available<15%,imagefs.available<15% \ --eviction-soft-grace-period=memory.available=2m,nodefs.available=2m,imagefs.available=2m \ --eviction-max-pod-grace-period=30 \ --eviction-minimum-reclaim=memory.available=0Mi,nodefs.available=500Mi,imagefs.available=500Mi"
修改Kubelet启动service文件 /lib/systemd/system/kubelet.service
[Unit] Description=kubelet: The Kubernetes Node Agent Documentation=https://kubernetes.io/docs/home/ [Service] ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/cpuset/system.slice/kubelet.service ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/hugetlb/system.slice/kubelet.service ExecStart=/usr/bin/kubelet Restart=always StartLimitInterval=0 RestartSec=10 [Install] WantedBy=multi-user.target
3.重启kubelet 和docker服务,再次查看节点的Capacity和Allocatable
# systemctl restart docker && systemctl restart kubelet # kubectl describe nodes <node-name> Addresses: InternalIP: 192.168.17.150 Hostname: k8s-01 Capacity: cpu: 2 ephemeral-storage: 99561988Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 4026372Ki pods: 110 Allocatable: cpu: 1 ephemeral-storage: 91756327989 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 1819702679 #大概1.6g pods: 110
对比:
再次声明:需要重启docker和kubelet,生产环境慎行,1.21版本后使用yaml文件动态配置即可
类似于这种方式
apiVersion: kubelet.config.k8s.io/v1beta1 kind: KubeletConfiguration evictionHard: memory.available: "500Mi" nodefs.available: "1Gi" imagefs.available: "100Gi" evictionMinimumReclaim: memory.available: "0Mi" nodefs.available: "500Mi" imagefs.available: "2Gi"