开发者社区 > 云原生 > 容器服务 > 正文

有关多个Kata容器和NIC直通的问题

不确定这是否与Kata有关。我对Kata和使用带有VFIO的设备直通都是新手,所以很难理解根本原因。

我有一个带有三个Intel 82580千兆网卡的工作站。每个卡上有4个端口,提供12个NIC/端口。

sudo lshw -class network -businfo -numeric Bus info          Device          Class          Description

pci@0000:01:00.0  enp1s0f0        network        82580 Gigabit Network Connection [8086:150E] pci@0000:01:00.1  enp1s0f1        network        82580 Gigabit Network Connection [8086:150E] pci@0000:01:00.2  enp1s0f2        network        82580 Gigabit Network Connection [8086:150E] pci@0000:01:00.3  enp1s0f3        network        82580 Gigabit Network Connection [8086:150E] pci@0000:02:00.0  enp2s0f0        network        82580 Gigabit Network Connection [8086:150E] pci@0000:02:00.1  enp2s0f1        network        82580 Gigabit Network Connection [8086:150E] pci@0000:02:00.2  enp2s0f2        network        82580 Gigabit Network Connection [8086:150E] pci@0000:02:00.3  enp2s0f3        network        82580 Gigabit Network Connection [8086:150E] pci@0000:03:00.0  enp3s0f0        network        82580 Gigabit Network Connection [8086:150E] pci@0000:03:00.1  enp3s0f1        network        82580 Gigabit Network Connection [8086:150E] pci@0000:03:00.2  enp3s0f2        network        82580 Gigabit Network Connection [8086:150E] pci@0000:03:00.3  enp3s0f3        network        82580 Gigabit Network Connection [8086:150E]

我想创建12个Kata容器,每个Kata容器将获得其中一个带有设备直通的物理NIC。

以下是我使用VFIO配置设备直通的方式:

$ NIC=enp1s0f0 $ BDF=$(sudo lshw -class network -businfo -numeric | grep ${NIC} | awk '{print $1;}' | cut -d@ -f2) $ sudo echo $BDF | sudo tee /sys/bus/pci/devices/$BDF/driver/unbind $ sudo lspci -n -s $BDF 0000:01:00.0 0200: 8086:150e (rev 01)

$ sudo echo 8086 150e | sudo tee /sys/bus/pci/drivers/vfio-pci/new_id $ sudo echo 8086 150e | sudo tee /sys/bus/pci/drivers/vfio-pci/remove_id

and repeate for all devices

然后这是所有设备的清单:

ll /dev/vfio/ total 0 drwxr-xr-x  2 root root      300 Mar 29 10:35 ./ drwxr-xr-x 18 root root     4340 Mar 29 10:35 ../ crw-------  1 root root 241,   0 Mar 29 10:35 10 crw-------  1 root root 241,   1 Mar 29 10:35 11 crw-------  1 root root 241,   2 Mar 29 10:35 12 crw-------  1 root root 241,   3 Mar 29 10:35 13 crw-------  1 root root 241,   4 Mar 29 10:35 14 crw-------  1 root root 241,   5 Mar 29 10:35 15 crw-------  1 root root 241,   6 Mar 29 10:35 16 crw-------  1 root root 241,   7 Mar 29 10:35 17 crw-------  1 root root 241,   8 Mar 29 10:35 18 crw-------  1 root root 241,   9 Mar 29 10:35 19 crw-------  1 root root 241,  10 Mar 29 10:35 20 crw-------  1 root root 241,  11 Mar 29 10:35 21 crw-rw-rw-  1 root root  10, 196 Mar 29 10:28 vfio

然后我启动每个Kata容器(1-7):

sudo nerdctl run --cgroup-manager cgroupfs --runtime "io.containerd.kata.v2" --cap-add=CAP_NET_ADMIN -d --device /dev/vfio/11 --name tga1 ubuntu:latest sleep infinity ... sudo nerdctl run --cgroup-manager cgroupfs --runtime "io.containerd.kata.v2" --cap-add=CAP_NET_ADMIN -d --device /dev/vfio/17 --name tga7 ubuntu:latest sleep infinity

所以到目前为止一切都很顺利,从卡塔容器1-7开始。但当我试图启动Kata容器8-12时,它失败了:

sudo nerdctl run --cgroup-manager cgroupfs --runtime "io.containerd.kata.v2" --cap-add=CAP_NET_ADMIN -d --device /dev/vfio/18 --name tga8 ubuntu:latest sleep infinity FATA[0001] failed to create shim task: QMP command failed: Device 'vfio-638fa5a1eac4abed0' not found: not found

容器日志如下:

Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.121740023Z" level=error msg="VFIO_MAP_DMA failed: Bad address" name=containerd-shim-v2 pid=3629 qemuPid=3640 sandbox=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 source=virtcontainers/hypervisor subsystem=qemu Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.182792456Z" level=error msg="failed to hotplug VFIO device" error="QMP command failed: Device 'vfio-638fa5a1eac4abed0' not found" name=containerd-shim-v2 pid=3629 sandbox=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 source=virtcontainers subsystem=sandbox vfio-device-BDF="03:00.0" vfio-device-ID=vfio-638fa5a1eac4abed0 Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.182952308Z" level=error msg="Failed to add device" error="QMP command failed: Device 'vfio-638fa5a1eac4abed0' not found" name=containerd-shim-v2 pid=3629 sandbox=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 source=virtcontainers subsystem=device Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.183018247Z" level=error msg="container create failed" container=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 error="QMP command failed: Device 'vfio-638fa5a1eac4abed0' not found" name=containerd-shim-v2 pid=3629 sandbox=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 source=virtcontainers subsystem=container Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.183155694Z" level=warning error="no such file or directory" name=containerd-shim-v2 pid=3629 sandbox=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 share-dir=/run/kata-containers/shared/sandboxes/bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585/mounts/bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585/rootfs source=virtcontainers subsystem=mount Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.183248451Z" level=warning msg="Could not remove container share dir" error="no such file or directory" name=containerd-shim-v2 pid=3629 sandbox=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 share-dir=/run/kata-containers/shared/sandboxes/bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585/mounts/bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 source=virtcontainers subsystem=fs_share Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.186362442Z" level=error msg="qemu-system-x86_64: Failed to write msg. Wrote -1 instead of 20." name=containerd-shim-v2 pid=3629 qemuPid=3640 sandbox=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 source=virtcontainers/hypervisor subsystem=qemu Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.187017225Z" level=error msg="qemu-system-x86_64: Failed to set msg fds." name=containerd-shim-v2 pid=3629 qemuPid=3640 sandbox=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 source=virtcontainers/hypervisor subsystem=qemu Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.187118664Z" level=error msg="qemu-system-x86_64: vhost VQ 0 ring restore failed: -22: Invalid argument (22)" name=containerd-shim-v2 pid=3629 qemuPid=3640 sandbox=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 source=virtcontainers/hypervisor subsystem=qemu Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.187205270Z" level=error msg="qemu-system-x86_64: Failed to set msg fds." name=containerd-shim-v2 pid=3629 qemuPid=3640 sandbox=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 source=virtcontainers/hypervisor subsystem=qemu Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.187560148Z" level=error msg="qemu-system-x86_64: vhost VQ 1 ring restore failed: -22: Invalid argument (22)" name=containerd-shim-v2 pid=3629 qemuPid=3640 sandbox=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 source=virtcontainers/hypervisor subsystem=qemu Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.187841618Z" level=error msg="qemu-system-x86_64: Failed to set msg fds." name=containerd-shim-v2 pid=3629 qemuPid=3640 sandbox=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 source=virtcontainers/hypervisor subsystem=qemu Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.187930071Z" level=error msg="qemu-system-x86_64: vhost_set_vring_call failed: Invalid argument (22)" name=containerd-shim-v2 pid=3629 qemuPid=3640 sandbox=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 source=virtcontainers/hypervisor subsystem=qemu Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.188174089Z" level=error msg="qemu-system-x86_64: Failed to set msg fds." name=containerd-shim-v2 pid=3629 qemuPid=3640 sandbox=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 source=virtcontainers/hypervisor subsystem=qemu Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.188280375Z" level=error msg="qemu-system-x86_64: vhost_set_vring_call failed: Invalid argument (22)" name=containerd-shim-v2 pid=3629 qemuPid=3640 sandbox=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 source=virtcontainers/hypervisor subsystem=qemu Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.407737009Z" level=warning msg="failed to cleanup network" error="failed to get netns /var/run/netns/cnitest-c9473abe-8a0e-c3db-97fc-587148b1e378: failed to Statfs "/var/run/netns/cnitest-c9473abe-8a0e-c3db-97fc-587148b1e378": no such file or directory" id=/var/run/netns/cnitest-c9473abe-8a0e-c3db-97fc-587148b1e378 name=containerd-shim-v2 pid=3629 sandbox=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 source=katautils Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.409331737Z" level=info msg="shim disconnected" id=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.409378160Z" level=warning msg="cleaning up after shim disconnected" id=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 namespace=default Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.409385164Z" level=info msg="cleaning up dead shim" Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.425715930Z" level=error msg="failed to delete" cmd="/usr/bin/containerd-shim-kata-v2 -namespace default -address /run/containerd/containerd.sock -publish-binary /usr/local/bin/containerd -id bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 -bundle /run/containerd/io.containerd.runtime.v2.task/default/bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 delete" error="exit status 1" Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.425779746Z" level=warning msg="failed to clean up after shim disconnected" error="time="2023-03-29T10:45:40Z" level=warning msg="failed to cleanup container" container=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 error="open /run/vc/sbs/bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585: no such file or directory" name=containerd-shim-v2 pid=3774 sandbox=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 source=containerd-kata-shim-v2\nio.containerd.kata.v2: open /run/vc/sbs/bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585: no such file or directory: exit status 1" id=bd409e48d98ff9817b98bf29244980b955305d5524abbf722d3cae7072682585 namespace=default Mar 29 10:45:40 wse-c0260 containerd[874]: time="2023-03-29T10:45:40.425835086Z" level=error msg="copy shim log" error="read /proc/self/fd/45: file already closed"

所以我真的不明白发生了什么。我试着在互联网上搜索,但还没有找到任何能解决我问题的东西。所以我不确定这是否与VFIO和设备直通或Kata有关,因为这两个领域对我来说都是新的。

如果有人能给我一些意见,帮助我前进,我将不胜感激。

Versions: Ubuntu 22.04 Kata 3.1.0 containerd 1.6.18 nerdctl 1.2.1

原提问者GitHub用户tse77 如对项目有进一步反馈,请在 GitHub 提交 issue https://github.com/kata-containers/kata-containers/issues

展开
收起
码字王 2023-05-17 16:09:06 313 0
1 条回答
写回答
取消 提交回答
  • 不知道发生了什么,但有几个注意事项:

    -您首先启动的是/dev/vfio/11,而不是/dev/vfio/10。这并不重要,因为它在第8位而不是第12位失败了。

    -看看实际的qemu命令行是什么会很有用,看看我们是否能发现一些明显的东西。

    -dmesg日志可能会给我们更多线索。

    原回答者GitHub用户c3d 如对项目有进一步反馈,请在 GitHub 提交 issue https://github.com/kata-containers/kata-containers/issues

    2023-05-17 16:23:42
    赞同 展开评论 打赏
问答分类:
问答标签:

国内唯一 Forrester 公共云容器平台领导者象限。

相关电子书

更多
阿里云文件存储 NAS 在容器场景的最佳实践 立即下载
何种数据存储才能助力容器计算 立即下载
《容器网络文件系统CNFS》 立即下载