On Thu, Mar 13, 2025 at 04:37:16PM +0100, Stefano Garzarella wrote: > Hi Bobby, > first of all, thank you for starting this work again! > You're welcome, thank you for your work getting it started! > On Wed, Mar 12, 2025 at 07:28:33PM -0700, Bobby Eshleman wrote: > > Hey all, > > > > Apologies for forgetting the 'net-next' prefix on this one. Should I > > resend or no? > > I'd say let's do a firts review cycle on this, then you can re-post. > Please check also maintainer cced, it looks like someone is missing: > https://patchwork.kernel.org/project/netdevbpf/patch/20250312-vsock-netns-v2-1-84bffa1aa97a@xxxxxxxxx/ > Duly noted, I'll double-check the ccs next time. sgtm on the re-post! > > On Wed, Mar 12, 2025 at 01:59:34PM -0700, Bobby Eshleman wrote: > > > Picking up Stefano's v1 [1], this series adds netns support to > > > vhost-vsock. Unlike v1, this series does not address guest-to-host (g2h) > > > namespaces, defering that for future implementation and discussion. > > > > > > Any vsock created with /dev/vhost-vsock is a global vsock, accessible > > > from any namespace. Any vsock created with /dev/vhost-vsock-netns is a > > > "scoped" vsock, accessible only to sockets in its namespace. If a global > > > vsock or scoped vsock share the same CID, the scoped vsock takes > > > precedence. > > This inside the netns, right? > I mean if we are in a netns, and there is a VM A attached to > /dev/vhost-vsock-netns witch CID=42 and a VM B attached to /dev/vhost-vsock > also with CID=42, this means that VM A will not be accessible in the netns, > but it can be accessible outside of the netns, > right? > In this scenario, CID=42 goes to VM A (/dev/vhost-vsock-netns) for any socket in its namespace. For any other namespace, CID=42 will go to VM B (/dev/vhost-vsock). If I understand your setup correctly: Namespace 1: VM A - /dev/vhost-vsock-netns, CID=42 Process X Namespace 2: VM B - /dev/vhost-vsock, CID=42 Process Y Namespace 3: Process Z In this scenario, taking connect() as an example: Process X connect(CID=42) goes to VM A Process Y connect(CID=42) goes to VM B Process Z connect(CID=42) goes to VM B If VM A goes away (migration, shutdown, etc...): Process X connect(CID=42) also goes to VM B > > > > > > If a socket in a namespace connects with a global vsock, the CID becomes > > > unavailable to any VMM in that namespace when creating new vsocks. If > > > disconnected, the CID becomes available again. > > IIUC if an application in the host running in a netns, is connected to a > guest attached to /dev/vhost-vsock (e.g. CID=42), a new guest can't be ask > for the same CID (42) on /dev/vhost-vsock-netns in the same netns till that > connection is active. Is that right? > Right. Here is the scenario I am trying to avoid: Step 1: namespace 1, VM A allocated with CID 42 on /dev/vhost-vsock Step 2: namespace 2, connect(CID=42) (this is legal, preserves old behavior) Step 3: namespace 2, VM B allocated with CID 42 on /dev/vhost-vsock-netns After step 3, CID=42 in this current namespace should belong to VM B, but the connection from step 2 would be with VM A. I think we have some options: 1. disallow the new VM B because the namespace is already active with VM A 2. try and allow the connection to resume, but just make sure that new connections got o VM B 3. close the connection from namespace 2, spin up VM B, hope user manages connection retry 4. auto-retry connect to the new VM B? (seems like doing too much on the kernel side to me) I chose option 1 for this rev mostly for the simplicity but definitely open to suggestions. I think option 3 is also a simple implementation. Option 2 would require adding some concept of "vhost-vsock ns at time of connection" to each socket, so the tranport would know which vhost_vsock to use for which socket. > > > > > > Testing > > > > > > QEMU with /dev/vhost-vsock-netns support: > > > https://github.com/beshleman/qemu/tree/vsock-netns > > You can also use unmodified QEMU using `vhostfd` parameter of > `vhost-vsock-pci` device: > > # FD will contain the file descriptor to /dev/vhost-vsock-netns > exec {FD}<>/dev/vhost-vsock-netns > > # pass FD to the device, this is used for example by libvirt > qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \ > -drive file=fedora.qcow2,format=qcow2,if=virtio \ > -object memory-backend-memfd,id=mem,size=512M \ > -device vhost-vsock-pci,vhostfd=${FD},guest-cid=42 -nographic > Very nice, thanks, I didn't realize that! > That said, I agree we can extend QEMU with `netns` param too. > I'm open to either. Your solution above is super elegant. > BTW, I'm traveling, I'll be back next Tuesday and I hope to take a deeper > look to the patches. > > Thanks, > Stefano > Thanks Stefano! Enjoy the travel. Best, Bobby