Re: [PATCH v2 0/3] vsock: add namespace support to vhost-vsock

Bobby Eshleman <bobbyeshleman@xxxxxxxxx> · Thu, 13 Mar 2025 09:20:14 -0700

On Thu, Mar 13, 2025 at 04:37:16PM +0100, Stefano Garzarella wrote:
> Hi Bobby,
> first of all, thank you for starting this work again!
> 

You're welcome, thank you for your work getting it started!

> On Wed, Mar 12, 2025 at 07:28:33PM -0700, Bobby Eshleman wrote:
> > Hey all,
> > 
> > Apologies for forgetting the 'net-next' prefix on this one. Should I
> > resend or no?
> 
> I'd say let's do a firts review cycle on this, then you can re-post.
> Please check also maintainer cced, it looks like someone is missing:
> https://patchwork.kernel.org/project/netdevbpf/patch/20250312-vsock-netns-v2-1-84bffa1aa97a@xxxxxxxxx/
> 

Duly noted, I'll double-check the ccs next time. sgtm on the re-post!

> > On Wed, Mar 12, 2025 at 01:59:34PM -0700, Bobby Eshleman wrote:
> > > Picking up Stefano's v1 [1], this series adds netns support to
> > > vhost-vsock. Unlike v1, this series does not address guest-to-host (g2h)
> > > namespaces, defering that for future implementation and discussion.
> > > 
> > > Any vsock created with /dev/vhost-vsock is a global vsock, accessible
> > > from any namespace. Any vsock created with /dev/vhost-vsock-netns is a
> > > "scoped" vsock, accessible only to sockets in its namespace. If a global
> > > vsock or scoped vsock share the same CID, the scoped vsock takes
> > > precedence.
> 
> This inside the netns, right?
> I mean if we are in a netns, and there is a VM A attached to
> /dev/vhost-vsock-netns witch CID=42 and a VM B attached to /dev/vhost-vsock
> also with CID=42, this means that VM A will not be accessible in the netns,
> but it can be accessible outside of the netns,
> right?
> 

In this scenario, CID=42 goes to VM A (/dev/vhost-vsock-netns) for any
socket in its namespace.  For any other namespace, CID=42 will go to VM
B (/dev/vhost-vsock).

If I understand your setup correctly:

	Namespace 1:
		VM A - /dev/vhost-vsock-netns, CID=42
		Process X
	Namespace 2:
		VM B - /dev/vhost-vsock, CID=42
		Process Y
	Namespace 3:
		Process Z

In this scenario, taking connect() as an example:
	Process X connect(CID=42) goes to VM A
	Process Y connect(CID=42) goes to VM B
	Process Z connect(CID=42) goes to VM B

If VM A goes away (migration, shutdown, etc...):
	Process X connect(CID=42) also goes to VM B

> > > 
> > > If a socket in a namespace connects with a global vsock, the CID becomes
> > > unavailable to any VMM in that namespace when creating new vsocks. If
> > > disconnected, the CID becomes available again.
> 
> IIUC if an application in the host running in a netns, is connected to a
> guest attached to /dev/vhost-vsock (e.g. CID=42), a new guest can't be ask
> for the same CID (42) on /dev/vhost-vsock-netns in the same netns till that
> connection is active. Is that right?
> 

Right. Here is the scenario I am trying to avoid:

Step 1: namespace 1, VM A allocated with CID 42 on /dev/vhost-vsock
Step 2: namespace 2, connect(CID=42) (this is legal, preserves old
behavior)
Step 3: namespace 2, VM B allocated with CID 42 on
/dev/vhost-vsock-netns

After step 3, CID=42 in this current namespace should belong to VM B, but
the connection from step 2 would be with VM A.

I think we have some options:
1. disallow the new VM B because the namespace is already active with VM A
2. try and allow the connection to resume, but just make sure that new
   connections got o VM B
3. close the connection from namespace 2, spin up VM B, hope user
	 manages connection retry
4. auto-retry connect to the new VM B? (seems like doing too much on the
   kernel side to me)

I chose option 1 for this rev mostly for the simplicity but definitely
open to suggestions. I think option 3 is also a simple implementation.
Option 2 would require adding some concept of "vhost-vsock ns at time of
connection" to each socket, so the tranport would know which vhost_vsock
to use for which socket.

> > > 
> > > Testing
> > > 
> > > QEMU with /dev/vhost-vsock-netns support:
> > > 	https://github.com/beshleman/qemu/tree/vsock-netns
> 
> You can also use unmodified QEMU using `vhostfd` parameter of
> `vhost-vsock-pci` device:
> 
> # FD will contain the file descriptor to /dev/vhost-vsock-netns
> exec {FD}<>/dev/vhost-vsock-netns
> 
> # pass FD to the device, this is used for example by libvirt
> qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
>   -drive file=fedora.qcow2,format=qcow2,if=virtio \
>   -object memory-backend-memfd,id=mem,size=512M \
>   -device vhost-vsock-pci,vhostfd=${FD},guest-cid=42 -nographic
> 

Very nice, thanks, I didn't realize that!

> That said, I agree we can extend QEMU with `netns` param too.
> 

I'm open to either. Your solution above is super elegant.

> BTW, I'm traveling, I'll be back next Tuesday and I hope to take a deeper
> look to the patches.
> 
> Thanks,
> Stefano
> 

Thanks Stefano! Enjoy the travel.

Best,
Bobby