一、导读
经过最近研究,发现对systemd如何利用cgroup的实例少之又少,而且,很多人搞不清,在el7上,如果想使用cgroup到底怎么使用?到底该如何systemd为一个进程或者服务利用cgroup?
本文,实战举例详解一个服务,是如何通过systemd来利用cgroup对cpu,memory,blockIO资源进行管理的。
但是,本文需要您对systemd中的cgroup概念有基本的了解,对systemd管理服务的方式有基本的了解。
另外,libcgroup也不能再rhel7上使用,使用的话,会造成不可预想的后果,取而代之的是systemd USING CONTROL GROUPS
WARNING The deprecated cgconfig tool from the libcgroup package is available to mount and handle hierarchies for controllers not yet supported by systemd (most notably the net-prio controller). Never use libcgropup tools to modify the default hierarchies mounted by systemd since it would lead to unexpected behavior. The libcgroup library will be removed in the future versions of Red Hat Enterprise Linux. For more information on how to use cgconfig, see Chapter 3, Using libcgroup Tools.
在rhel7后,libcgroup不再存在,被systemd取代,systemd提供了一些使用cgroup的方针,但是,没有做到全部,比如,你不能通过systemd使用cpuset ,freezer,cpuset or freezer are currently not exposed at all due to the broken inheritance semantics of the kernel logic. Also, migrating units to a different slice at runtime is not supported (i.e. altering the Slice= property for running units) as the kernel currently lacks atomic cgroup subtree moves.
How is it possible to safely achieve the same result (exclusive cpu access/arbitrary cpusets for userland applications) ?
虽然,systemd不支持cpuset,但是相信以后会支持的,另外,现在有一个略显笨拙,但是可以实现同样的目标的方法:请见下文介绍
二、实战
第一步:创建slice,service
使用systemd创建启动一个cc.service
[root@localhost /home/ahao.mah/systemd] #cat /usr/libexec/cc.py #!/usr/bin/python while True: pass
[root@localhost /home/ahao.mah/systemd] #chmod +x /usr/libexec/cc.py
创建cc.service的unit文件
[root@localhost /home/ahao.mah/systemd] #vim /etc/systemd/system/cc.service [Unit] Description=cc ConditionFileIsExecutable=/usr/libexec/cc.py [Service] Type=simple ExecStart=/usr/libexec/cc.py [Install] WantedBy=multi-user.target
启动cc服务
[root@localhost /home/ahao.mah/systemd] #systemctl restart cc
[root@localhost /home/ahao.mah/systemd] #systemctl status cc ● cc.service - cc Loaded: loaded (/etc/systemd/system/cc.service; disabled; vendor preset: disabled) Active: active (running) since Fri 2016-08-26 11:00:12 CST; 5s ago Main PID: 33542 (cc.py) CGroup: /system.slice/cc.service └─33542 /usr/bin/python /usr/libexec/cc.py Aug 26 11:00:12 localhost systemd[1]: Started cc. Aug 26 11:00:12 localhost systemd[1]: Starting cc...
cc服务跑满了cpu
[root@localhost /home/ahao.mah/systemd] #top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 33542 root 20 0 125320 4532 2032 R 100.0 0.0 0:23.39 cc.py
[root@localhost /home/ahao.mah/systemd] #mpstat -P ALL 1 10 Linux 3.10.0-327.alx2000.alxos7.x86_64 (localhost) 08/26/2016 _x86_64_ (24 CPU) 12:10:30 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 12:10:31 PM all 4.16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 95.84 12:10:31 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 8 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:10:31 PM 9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 13 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 21 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 12:10:31 PM 23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
你会发现cc.service是通过systemd启动的,所以,执行systemd-cgls,将会在system.slice下面。如果你不经过systemd执行/usr/libexec/cc.py,那么,执行systemd-cgls,这个进程将属于cgroup树的user.slice下。
[root@localhost /home/ahao.mah/systemd] #systemd-cgls └─system.slice ├─cc.service │ └─35480 /usr/bin/python /usr/libexec/cc.py
第二步:使用crgoup控制进程资源
首先,判断cc服务,属于cgroup树的哪个分支,很明显,我们既然没有在配置改变过,那么cc服务,一定属于system.slice
[root@localhost /home/ahao.mah/systemd] #systemctl show cc Slice=system.slice ControlGroup=/system.slice/cc.service
修改服务,所属slice
[root@localhost /home/ahao.mah/systemd] #vim /etc/systemd/system/cc.service [Unit] Description=cc ConditionFileIsExecutable=/usr/libexec/cc.py [Service] Type=simple ExecStart=/usr/libexec/cc.py Slice=jiangyi.slice [Install] WantedBy=multi-user.target
[root@localhost /home/ahao.mah/systemd] #systemd-cgl ├─jiangyi.slice │ └─cc.service │ └─37720 /usr/bin/python /usr/libexec/cc.py
然而,此时,我们并没有为jiangyi.slice使用cgroup
[root@localhost /home/ahao.mah/systemd] #lscgroup |grep jiangyi.slice [root@localhost /home/ahao.mah/systemd] #lscgroup |grep cc.service
在/etc/systemd/system/cc.service中添加CPUAccounting=yes。这是在宣布,jiangyi.slice,和jiangyi.slice下的cc.service,都将开始使用cgroup的cpu,cpuacct这个资源管理。
[root@localhost /home/ahao.mah/systemd] #lscgroup |grep jiangyi cpu,cpuacct:/jiangyi.slice cpu,cpuacct:/jiangyi.slice/cc.service [root@localhost /home/ahao.mah/systemd] #lscgroup |grep cc.service cpu,cpuacct:/jiangyi.slice/cc.service
然而,此时cc.service依然占用了cpu的100%,如下,都是这2个参数的默认值。其中,可以用 cpu.cfs_period_us 和 cpu.cfs_quota_us 来限制该组中的所有进程在单位时间里可以使用的 cpu 时间。这里的 cfs 是完全公平调度器的缩写。cpu.cfs_period_us 就是时间周期,默认为 100000,即百毫秒。cpu.cfs_quota_us 就是在这期间内可使用的 cpu 时间,默认 -1,即无限制。
[root@localhost /home/ahao.mah/systemd] #cat /sys/fs/cgroup/cpu/jiangyi.slice/cc.service/cpu.cfs_period_us 100000 [root@localhost /home/ahao.mah/systemd] #cat /sys/fs/cgroup/cpu/jiangyi.slice/cc.service/cpu.cfs_quota_us -1
所以,只要执行如下2步,cc.service的cpu占用率就会立刻跌倒50%。
[root@localhost /home/ahao.mah/systemd] #ps aux | grep cc root 39402 99.8 0.0 125320 4536 ? Rs 11:21 5:38 /usr/bin/python /usr/libexec/cc.py
echo 50000 > /sys/fs/cgroup/cpu/jiangyi.slice/cc.service/cpu.cfs_quota_us echo 39402 > /sys/fs/cgroup/cpu/jiangyi.slice/cc.service/tasks
[root@localhost /home/ahao.mah/systemd] #top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 39402 root 20 0 125320 4536 2032 R 50.2 0.0 7:57.40 cc.py
下面,开始考虑,如何通过systemd的unit文件,利用cgroup管理资源呢?
第三步:systemd控制crgoup
systemd是如何使用cgroup的,这个问题困扰了很多的同学,systemd其实是通过UNIT文件的配置,来使用cgroup的功能的,比如,使得cc.srevice利用cgroup的cpu,memory,blockIO的资源管理;需要的参数分别是:CPUAccounting=yes MemoryAccounting=yes TasksAccounting=yes BlockIOAccounting=yes
那么,这些参数,在#man systemd.resource-control中,有详细的解释。
举例:
[root@localhost /home/ahao.mah/systemd] #cat /etc/systemd/system/cc.service [Unit] Description=cc ConditionFileIsExecutable=/usr/libexec/cc.py [Service] Type=simple ExecStart=/usr/libexec/cc.py Slice=jiangyi.slice CPUAccounting=yes MemoryAccounting=yes TasksAccounting=yes BlockIOAccounting=yes [Install] WantedBy=multi-user.target
检查cgroup树中是否存在我们的cc.service,jiangyi.slice
[root@localhost /home/ahao.mah/systemd] #lscgroup |grep jiangyi cpu,cpuacct:/jiangyi.slice cpu,cpuacct:/jiangyi.slice/cc.service blkio:/jiangyi.slice blkio:/jiangyi.slice/cc.service memory:/jiangyi.slice memory:/jiangyi.slice/cc.service
[root@localhost /home/ahao.mah/systemd] #lscgroup |grep cc.service cpu,cpuacct:/jiangyi.slice/cc.service blkio:/jiangyi.slice/cc.service memory:/jiangyi.slice/cc.service
cgroup的信息,在systemctl status cc中也是有体现的。
[root@localhost /home/ahao.mah/systemd] #systemctl status cc ● cc.service - cc Loaded: loaded (/etc/systemd/system/cc.service; disabled; vendor preset: disabled) Active: active (running) since Fri 2016-08-26 14:18:28 CST; 24s ago Main PID: 84861 (cc.py) Memory: 2.5M CGroup: /jiangyi.slice/cc.service └─84861 /usr/bin/python /usr/libexec/cc.py Aug 26 14:18:28 localhost systemd[1]: Started cc. Aug 26 14:18:28 localhost systemd[1]: Starting cc...
三、实际应用
3.1 限制cpu:cpu.shares
cc.service
[root@localhost /root] #systemctl cat cc # /etc/systemd/system/cc.service [Unit] Description=cc ConditionFileIsExecutable=/usr/libexec/cc.py [Service] Type=simple ExecStart=/usr/libexec/cc.py Slice=jiangyi.slice CPUAccounting=yes MemoryAccounting=yes TasksAccounting=yes BlockIOAccounting=yes [Install] WantedBy=multi-user.target
ee.service
[root@localhost /root] #systemctl cat ee # /etc/systemd/system/ee.service [Unit] Description=ee ConditionFileIsExecutable=/usr/libexec/ee.py [Service] Type=simple ExecStart=/usr/libexec/cc.py Slice=jiangyi.slice CPUAccounting=yes MemoryAccounting=yes TasksAccounting=yes BlockIOAccounting=yes [Install] WantedBy=multi-user.target
默认:cpu.shares都是1024
[root@localhost /root] #cat /sys/fs/cgroup/cpu/jiangyi.slice/cpu.shares 1024 [root@localhost /root] #cat /sys/fs/cgroup/cpu/jiangyi.slice/cc.service/cpu.shares 1024 [root@localhost /root] #cat /sys/fs/cgroup/cpu/jiangyi.slice/ee.service/cpu.shares 1024
mpstat -P ALL 1 2:跑慢了2个cpu core
[root@localhost /root] #mpstat -P ALL 1 2 Linux 3.10.0-327.alx2000.alxos7.x86_64 (localhost) 09/18/2016 _x86_64_ (24 CPU) 08:32:09 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 08:32:10 PM all 8.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 91.67 08:32:10 PM 0 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.00 99.01 08:32:10 PM 1 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.00 99.01 08:32:10 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 08:32:10 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 08:32:10 PM 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 08:32:10 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 08:32:10 PM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 08:32:10 PM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 08:32:10 PM 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 08:32:10 PM 9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 08:32:10 PM 10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 08:32:10 PM 11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 08:32:10 PM 12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 08:32:10 PM 13 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 08:32:10 PM 14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 08:32:10 PM 15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 08:32:10 PM 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 08:32:10 PM 17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 08:32:10 PM 18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 08:32:10 PM 19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 08:32:10 PM 20 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 08:32:10 PM 21 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 08:32:10 PM 22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 08:32:10 PM 23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
cpu.shares 不是限制进程能使用的绝对的 cpu 时间,而是控制各个组之间的配额
这里先参考一下:用 cgroups 管理 cpu 资源
3.2 限制cpu:CPUQuota=40%
如下,仅仅CPUAccounting=yes MemoryAccounting=yes TasksAccounting=yes BlockIOAccounting=yes,打开这些统计不行,我们还要限制service对资源的使用;
[root@localhost /home/ahao.mah/systemd] #cat /etc/systemd/system/cc.service [Unit] Description=cc ConditionFileIsExecutable=/usr/libexec/cc.py [Service] Type=simple ExecStart=/usr/libexec/cc.py Slice=jiangyi.slice CPUAccounting=yes MemoryAccounting=yes TasksAccounting=yes BlockIOAccounting=yes [Install] WantedBy=multi-user.target
前面看到了,cc.service吃掉了一个cpu的100%,现在我们就限制它,新增参数:CPUQuota=40%
[root@localhost /root] #cat /etc/systemd/system/cc.service [Unit] Description=cc ConditionFileIsExecutable=/usr/libexec/cc.py [Service] Type=simple ExecStart=/usr/libexec/cc.py Slice=jiangyi.slice CPUAccounting=yes CPUQuota=40% MemoryAccounting=yes TasksAccounting=yes BlockIOAccounting=yes [Install] WantedBy=multi-user.target
[root@localhost /root] #systemctl daemon-reload
[root@localhost /root] #systemctl restart cc.service
如下,你会发现,cc.service最多可以占用40%的单个cpu;
[root@localhost /root] #mpstat -P ALL 1 3 Linux 3.10.0-327.alx2000.alxos7.x86_64 (localhost) 09/18/2016 _x86_64_ (24 CPU) 05:28:43 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 05:28:44 PM all 1.75 0.00 0.08 0.00 0.00 0.00 0.00 0.00 0.00 98.17 05:28:44 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:28:44 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:28:44 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:28:44 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:28:44 PM 4 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.00 99.01 05:28:44 PM 5 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.00 99.01 05:28:44 PM 6 40.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 60.00 05:28:44 PM 7 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 99.00 05:28:44 PM 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:28:44 PM 9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:28:44 PM 10 0.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.01 05:28:44 PM 11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:28:44 PM 12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:28:44 PM 13 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.00 99.01 05:28:44 PM 14 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.00 99.01 05:28:44 PM 15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:28:44 PM 16 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.00 05:28:44 PM 17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:28:44 PM 18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:28:44 PM 19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:28:44 PM 20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:28:44 PM 21 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:28:44 PM 22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:28:44 PM 23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
3.3 限制memory
内存蹭蹭蹭
[root@localhost /root] #cat /usr/libexec/dd #!/usr/bin/bash x="a" while [ True ];do x=$x$x done;
内存蹭蹭蹭到了2G
[root@localhost /root] #systemctl status dd ● dd.service - dd Loaded: loaded (/etc/systemd/system/dd.service; disabled; vendor preset: disabled) Active: active (running) since Sun 2016-09-18 17:58:01 CST; 59s ago Main PID: 53549 (dd) Memory: 2.0G CGroup: /jiangyi.slice/dd.service └─53549 /usr/bin/bash /usr/libexec/dd
[root@localhost /root] #pid=`ps -ef|grep cc|grep -v grep |awk '{print $2}'` ; vmrss=`cat /proc/${pid}/status|grep -i VmRSS|awk '{print $2}'`;vmrss_m=$(($vmrss/1024));echo $vmrss_m 4
限制最多使用内存200M
[root@localhost /root] #cat /etc/systemd/system/dd.service [Unit] Description=dd ConditionFileIsExecutable=/usr/libexec/cc.py [Service] Type=simple ExecStart=/usr/libexec/dd Slice=jiangyi.slice CPUAccounting=yes CPUQuota=40% MemoryAccounting=yes MemoryMax=100M MemoryLimit=200M TasksAccounting=yes BlockIOAccounting=yes [Install] WantedBy=multi-user.target
[root@localhost /root] #cat /sys/fs/cgroup/memory/jiangyi.slice/dd.service/memory.limit_in_bytes 209715200
如下,效果很明显,发现MemoryMax=100M(最新款)没有生效,MemoryLimit=200M(老款)生效了,那是因为,MemoryMax应该是 cgroup-v2的参数
[root@localhost /root] #systemctl status dd ● dd.service - dd Loaded: loaded (/etc/systemd/system/dd.service; disabled; vendor preset: disabled) Active: active (running) since Sun 2016-09-18 19:44:42 CST; 27s ago Main PID: 82182 (dd) Memory: 199.8M (limit: 200.0M) CGroup: /jiangyi.slice/dd.service └─82182 /usr/bin/bash /usr/libexec/dd
观察了一会儿,没有被立刻OOM kill掉,大概等了一会儿,才被kill掉;
[root@localhost /root] #systemctl status dd ● dd.service - dd Loaded: loaded (/etc/systemd/system/dd.service; disabled; vendor preset: disabled) Active: failed (Result: signal) since Sun 2016-09-18 20:00:06 CST; 10min ago Process: 84350 ExecStart=/usr/libexec/dd (code=killed, signal=KILL) Main PID: 84350 (code=killed, signal=KILL)
查看日志,确实被OOM kill掉了
Sep 18 20:18:35 jiangyi02 kernel: dd invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Sep 18 20:18:35 jiangyi02 kernel: dd cpuset=/ mems_allowed=0 Sep 18 20:18:35 jiangyi02 kernel: CPU: 0 PID: 89722 Comm: dd Not tainted 3.10.0-327.alx2000.alxos7.x86_64 #1 Sep 18 20:18:35 jiangyi02 kernel: Task in /jiangyi.slice/dd.service killed as a result of limit of /jiangyi.slice/dd.service Sep 18 20:18:35 jiangyi02 kernel: Memory cgroup stats for /jiangyi.slice/dd.service: cache:0KB rss:204800KB rss_huge:0KB mapped_file:0KB swap:2097064KB inactive_anon:102532KB active_anon:102268KB inactive_file:0KB active_file:0KB unevictable:0KB Sep 18 20:18:35 jiangyi02 kernel: [89722] 0 89722 684180 51453 1138 524310 0 dd Sep 18 20:18:35 jiangyi02 kernel: Memory cgroup out of memory: Kill process 89722 (dd) score 972 or sacrifice child Sep 18 20:18:35 jiangyi02 kernel: Killed process 89722 (dd) total-vm:2736720kB, anon-rss:204528kB, file-rss:1284kB Sep 18 20:18:35 jiangyi02 systemd[1]: dd.service: main process exited, code=killed, status=9/KILL Sep 18 20:18:35 jiangyi02 systemd[1]: Unit dd.service entered failed state. Sep 18 20:18:35 jiangyi02 systemd[1]: dd.service failed.
如下,是MemoryMax=bytes(新款上市) MemoryLimit=bytes(老款)的解释,
- MemoryMax=bytes 绝对刚性的限制该单元中的进程最多可以使用多少内存。这是一个不允许突破的刚性限制,触碰此限制会导致进程由于内存不足而被强制杀死。建议将 MemoryHigh= 用作主要的内存限制手段, 而将 MemoryMax= 用作不可突破的底线。
选项值可以是以字节为单位的绝对内存大小(可以使用以1024为基数的 K, M, G, T 后缀), 也可以是以百分比表示的相对内存大小(相对于系统的全部物理内存), 还可以设为特殊值 “infinity” 表示不作限制。此选项控制着cgroup的 “memory.max” 属性值,详见 cgroup-v2.txt 文档。
此选项隐含着 “MemoryAccounting=true”
此选项是新式资源控制选项。相当于旧式的 MemoryLimit= 选项。
- MemoryLimit=bytes 绝对刚性的限制该单元中的进程最多可以使用多少内存。这是一个不允许突破的刚性限制,触碰此限制会导致进程由于内存不足而被强制杀死。选项值可以是以字节为单位的绝对内存大小(可以使用以1024为基数的 K, M, G, T 后缀), 也可以是以百分比表示的相对内存大小(相对于系统的全部物理内存), 还可以设为特殊值 “infinity” 表示不作限制。此选项控制着cgroup的 “memory.limit_in_bytes” 属性值, 详见 memory.txt 文档。
此选项隐含着 “MemoryAccounting=true”
此选项是旧式资源控制选项。建议使用新式的 MemoryMax= 选项。
除了MemoryLimit=bytes限制内存的参数外,还有其它的参数,可以参看这里:看man手册
限制How to use cgroup cpusets with systemd in RHEL7?
使用cpuset去指定一个service的cpu,目前systemd不支持,所以man手册里也没有;
However, currently the cpuset interface is not exposed through system and, so as far as we can tell, this functionalxty is now not available: [2][3] Note that the number of cgroup attributes currently exposed as unit properties is limited. This will be extended later on, as their kernel interfaces are cleaned up. For example cpuset or freezer are currently not exposed at all due to the broken inheritance semantics of the kernel logic. Also, migrating units to a different slice at runtime is not supported (i.e. altering the Slice= property for running units) as the kernel currently lacks atomic cgroup subtree moves.
How is it possible to safely achieve the same result (exclusive cpu access/arbitrary cpusets for userland applications) ?
不过我们还是有一个变相的方法:如下方法摘自How to use cgroup cpusets with systemd in RHEL7?
[root@localhost /root] #cat /usr/libexec/ff.py #!/usr/bin/python while True: pass
[root@localhost /root] #vim /usr/lib/systemd/system/ff.service [Unit] Description=ff After=syslog.target network.target auditd.service [Service] ExecStartPre=/usr/bin/mkdir -p /sys/fs/cgroup/cpuset/mygroup1 ===> Create group to manage process ExecStartPre=/bin/bash -c '/usr/bin/echo "4" > /sys/fs/cgroup/cpuset/mygroup1/cpuset.cpus' ==> Assign cpu core 3 to this process ExecStartPre=/bin/bash -c '/usr/bin/echo "0" > /sys/fs/cgroup/cpuset/mygroup1/cpuset.mems' ExecStart=/usr/libexec/ff.py ===> Run this process ExecStartPost=/bin/bash -c '/usr/bin/echo $MAINPID > /sys/fs/cgroup/cpuset/mygroup1/tasks' ==> Assign process id to group task file ExecStopPost=/usr/bin/rmdir /sys/fs/cgroup/cpuset/mygroup1 ==> At the time of stop remove group Restart=on-failure [Install] WantedBy=multi-user.target
[root@localhost /root] #systemctl daemon-reload
[root@localhost /root] #systemctl restart ff
[root@localhost /root] #mpstat -P ALL 1 2 Linux 3.10.0-327.alx2000.alxos7.x86_64 (localhost) 09/18/2016 _x86_64_ (24 CPU) 09:15:50 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 09:15:51 PM all 4.20 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 95.76 09:15:51 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 4 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 09:15:51 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 13 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 21 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:15:51 PM 23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
四、额外须知:systemd-run
使用systemd-run,创建临时cgroup,举例:创建一个service:toptest.service并且在test.slice下。
[root@localhost /home/ahao.mah/systemd] #systemd-run --unit=toptest --slice=test top -b Running as unit toptest.service.
├─test.slice │ └─toptest.service │ └─34670 /usr/bin/top -b
临时的,也就是意味着UNIT都是临时的。
#systemctl cat toptest.service # /run/systemd/system/toptest.service # Transient stub # /run/systemd/system/toptest.service.d/50-Description.conf [Unit] Description=/usr/bin/top -b # /run/systemd/system/toptest.service.d/50-ExecStart.conf [Service] ExecStart= ExecStart=@/usr/bin/top "/usr/bin/top" "-b" # /run/systemd/system/toptest.service.d/50-Slice.conf [Service] Slice=test.slice
[root@localhost /home/ahao.mah/systemd] #lscgroup |grep test.slice cpu,cpuacct:/--slice\x3dtest.slice cpu,cpuacct:/test.slice blkio:/--slice\x3dtest.slice blkio:/test.slice memory:/--slice\x3dtest.slice memory:/test.slice
[root@localhost /home/ahao.mah/systemd] #lscgroup |grep toptest.service
[root@localhost /home/ahao.mah/systemd] #systemctl status toptest.service ● toptest.service - /usr/bin/top -b Loaded: loaded (/run/systemd/system/toptest.service; static; vendor preset: disabled) Drop-In: /run/systemd/system/toptest.service.d └─50-Description.conf, 50-ExecStart.conf, 50-Slice.conf Active: active (running) since Fri 2016-08-26 11:04:41 CST; 3h 29min ago Main PID: 34670 (top) CGroup: /test.slice/toptest.service └─34670 /usr/bin/top -b
[root@localhost /home/ahao.mah/systemd] #ll /sys/fs/cgroup/cpu/test.slice/ cgroup.clone_children cpuacct.proc_stat cpuacct.usage_percpu cpu.rt_period_us cpu.stat cgroup.event_control cpuacct.stat cpu.cfs_period_us cpu.rt_runtime_us notify_on_release cgroup.procs cpuacct.usage cpu.cfs_quota_us cpu.shares tasks
五、日常运维
停止一个service
#systemctl kill toptest.service --signal=SIGTERM
命令列界面设定参数
systemctl set-property httpd.service CPUShares=600 MemoryLimit=500M
希望此更改为临时更改,请添加 –runtime
systemctl set-property --runtime name property=value
cgroup 动态描述
systemd-cgtop
六、参考
- Linux Programmer’s Manual CGROUPS(7)
- The New Control Group Interfaces @freedesktop.org
- 7u官网文档
关注【ALiDataOps】
数据智能运维时代与你同行
微信扫一扫
关注该公众号
:,。视频小程序赞,轻点两下取消赞在看,轻点两下取消在看