Hey Stefano, Thanks a lot for all the details. I guess my next step is to try to implement the forwarding logic in vhost-device-vsock and take it from there. Regards, Dorjoy On Tue, Jul 2, 2024 at 6:05 PM Stefano Garzarella <sgarzare@xxxxxxxxxx> wrote: > > On Sun, Jun 30, 2024 at 04:54:18PM GMT, Dorjoy Chowdhury wrote: > >Hey Stefano, > >Apart from my questions in my previous email, I have some others as well. > > > >If the vhost-device-vsock modification to forward packets to > >VMADDR_CID_LOCAL is implemented, does the VMADDR_FLAG_TO_HOST need to > >be set by any application in the guest? I understand that the flag is > >set automatically in the listen path by the driver (ref: > >https://patchwork.ozlabs.org/project/netdev/patch/20201204170235.84387-4-andraprs@xxxxxxxxxx/#2594117 > >), but from the comments in the referenced patch, I am guessing the > >applications in the guest that will "connect" (as opposed to listen) > >will need to set the flag in the application code? So does the > >VMADDR_FLAG_TO_HOST flag need to be set by the applications in the > >guest that will "connect" or should it work without it? I am asking > >because the nitro-enclave VMs have an "init" which tries to connect to > >CID 3 to send a "hello" on boot to let the parent VM know that it > >booted expecting a "hello" reply but the init doesn't seem to set the > >flag https://github.com/aws/aws-nitro-enclaves-sdk-bootstrap/blob/main/init/init.c#L356C1-L361C7 > > Looking at af_vsock.c code, it looks like that if we don't have any > H2G transports (e.g. vhost-vsock) loaded in the VM (this is loaded for > nested VMs, so I guess for nitro-enclave VM this should not be the > case), the packets are forwarded to the host in any case. > > See > https://elixir.bootlin.com/linux/latest/source/net/vmw_vsock/af_vsock.c#L469 > > >. > > > >I was following > >https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock#sibling-vm-communication > >to test if sibling communication works and it seems like I didn't need > >to modify the "socat" to set the "VMADDR_FLAG_TO_HOST". I am wondering > >why it works without any modification. Here is what I do: > > > >shell1: ./vhost-device-vsock --vm > >guest-cid=3,uds-path=/tmp/vm3.vsock,socket=/tmp/vhost3.socket --vm > >guest-cid=4,uds-path=/tmp/vm4.vsock,socket=/tmp/vhost4.socket > > > >shell2: ./qemu-system-x86_64 -machine q35,memory-backend=mem0 > >-enable-kvm -m 8G -nic user,model=virtio -drive > >file=/home/dorjoy/Forks/test_vm/fedora2.qcow2,media=disk,if=virtio > >--display sdl -object memory-backend-memfd,id=mem0,size=8G -chardev > >socket,id=char0,reconnect=0,path=/tmp/vhost3.socket -device > >vhost-user-vsock-pci,chardev=char0 > > inside this guest I run: socat - VSOCK-LISTEN:9000 > > > >shell3: ./qemu-system-x86_64 -machine q35,memory-backend=mem0 > >-enable-kvm -m 8G -nic user,model=virtio -drive > >file=/home/dorjoy/Forks/test_vm/fedora40.qcow2,media=disk,if=virtio > >--display sdl -object memory-backend-memfd,id=mem0,size=8G -chardev > >socket,id=char0,reconnect=0,path=/tmp/vhost4.socket -device > >vhost-user-vsock-pci,chardev=char0 > > inside this guest I run: socat - VSOCK-CONNECT:3:9000 > > > >Then when I type something in the socat terminal of one VM and hit > >'enter', they pop up in the socat terminal of the other VM. From the > >documentation of the vhost-device-vsock, I thought I would need to > >patch socat to set the "VMADDR_FLAG_TO_HOST" but I did not do anything > >with socat. I simply did "sudo dnf install socat" in both VMs. I also > >looked into the socat source code and I didn't see any reference to > >"VMADDR_FLAG_TO_HOST". I am running "Fedora 40" on both VMs. Do you > >know why it works without the flag? > > Yep, so the driver will forward them if the H2G transport is not loaded, > like in your case. So if you set VMADDR_FLAG_TO_HOST you are sure that > it is always forwarded to the host, if you don't set it, it is forwarded > only if you don't have a nested VM using vhost-vsock. In that case we > don't know how to differentiate the case of communication with a nested > guest or a sibling guest, for this reason we added the flag. > > If the host uses vhost-vsock, that packets are discarded, but for > vhost-device-vsock, we are handling them. > > Hope this clarify. > > Stefano > > > > >On Wed, Jun 26, 2024 at 11:43 PM Dorjoy Chowdhury > ><dorjoychy111@xxxxxxxxx> wrote: > >> > >> Hey Stefano, > >> Thanks a lot for all the details. I will look into them and reach out > >> if I need further input. Thanks! I have tried to summarize my > >> understanding below. Let me know if that sounds correct. > >> > >> On Wed, Jun 26, 2024 at 2:37 PM Stefano Garzarella <sgarzare@xxxxxxxxxx> wrote: > >> > > >> > Hi Dorjoy, > >> > > >> > On Tue, Jun 25, 2024 at 11:44:30PM GMT, Dorjoy Chowdhury wrote: > >> > >Hey Stefano, > >> > > >> > [...] > >> > > >> > >> > > >> > >> >So the immediate plan would be to: > >> > >> > > >> > >> > 1) Build a new vhost-vsock-forward object model that connects to > >> > >> >vhost as CID 3 and then forwards every packet from CID 1 to the > >> > >> >Enclave-CID and every packet that arrives on to CID 3 to CID 2. > >> > >> > >> > >> This though requires writing completely from scratch the virtio-vsock > >> > >> emulation in QEMU. If you have time that would be great, otherwise if > >> > >> you want to do a PoC, my advice is to start with vhost-user-vsock which > >> > >> is already there. > >> > >> > >> > > > >> > >Can you give me some more details about how I can implement the > >> > >daemon? > >> > > >> > We already have a demon written in Rust, so I don't recommend you > >> > rewrite one from scratch, just start with that. You can find the daemon > >> > and instructions on how to use it with QEMU here [1]. > >> > > >> > >I would appreciate some pointers to code too. > >> > > >> > I sent the pointer to it in my first reply [2]. > >> > > >> > > > >> > >Right now, the "nitro-enclave" machine type (wip) in QEMU > >> > >automatically spawns a VHOST_VSOCK device with the CID equal to the > >> > >"guest-cid" machine option. I think this is equivalent to using the > >> > >"-device vhost-vsock-device,guest-cid=N" option explicitly. Does that > >> > >need any change? I guess instead of "vhost-vsock-device", the > >> > >vhost-vsock device needs to be equivalent to "-device > >> > >vhost-user-vsock-device,guest-cid=N"? > >> > > >> > Nope, the vhost-user-vsock device requires just a `chardev` option. > >> > The chardev points to the Unix socket used by QEMU to talk with the > >> > daemon. The daemon has a parameter to set the CID. See [1] for the > >> > examples. > >> > > >> > > > >> > >The applications inside the nitro-enclave VM will still connect and > >> > >talk to CID 3. So on the daemon side, do we need to spawn a device > >> > >that has CID 3 and then forward everything this device receives to CID > >> > >1 (VMADDR_CID_LOCAL) same port and everything it receives from CID 1 > >> > >to the "guest-cid"? > >> > > >> > Yep, I think this is right. > >> > Note: to use VMADDR_CID_LOCAL, the host needs to load `vsock_loopback` > >> > kernel module. > >> > > >> > Before modifying the code, if you want to do some testing, perhaps you > >> > can use socat (which supports both UNIX-* and VSOCK-*). The daemon for > >> > now exposes two unix sockets, one is used to communicate with QEMU via > >> > the vhost-user protocol, and the other is to be used by the application > >> > to communicate with vsock sockets in the guest using the hybrid protocol > >> > defined by firecracker. So you could initiate a socat between the latter > >> > and VMADDR_CID_LOCAL, the only problem I see is that you have to send > >> > the first string provided by the hybrid protocol (CONNECT 1234), but for > >> > a PoC it should be ok. > >> > > >> > I just tried the following and it works without touching any code: > >> > > >> > shell1$ ./target/debug/vhost-device-vsock \ > >> > --vm guest-cid=3,socket=/tmp/vhost3.socket,uds-path=/tmp/vm3.vsock > >> > > >> > shell2$ sudo modprobe vsock_loopback > >> > shell2$ socat VSOCK-LISTEN:1234 UNIX-CONNECT:/tmp/vm3.vsock > >> > > >> > shell3$ qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \ > >> > -drive file=fedora40.qcow2,format=qcow2,if=virtio\ > >> > -chardev socket,id=char0,path=/tmp/vhost3.socket \ > >> > -device vhost-user-vsock-pci,chardev=char0 \ > >> > -object memory-backend-memfd,id=mem,size=512M \ > >> > -nographic > >> > > >> > guest$ nc --vsock -l 1234 > >> > > >> > shell4$ nc --vsock 1 1234 > >> > CONNECT 1234 > >> > > >> > Note: the `CONNECT 1234` is required by the hybrid vsock protocol > >> > defined by firecracker, so if we extend the vhost-device-vsock > >> > daemon to forward packet to VMADDR_CID_LOCAL, that would not be > >> > needed (including running socat). > >> > > >> > >> Understood. Just trying to think out loud what the final UX will be > >> from the user perspective to successfully run a nitro VM before I try > >> to modify vhost-device-vsock to support forwarding to > >> VMADDR_CID_LOCAL. > >> I guess because the "vhost-user-vsock" device needs to be spawned > >> implicitly (without any explicit option) inside nitro-enclave in QEMU, > >> we now need to provide the "chardev" as a machine option, so the > >> nitro-enclave command would look something like below: > >> "./qemu-system-x86_64 -M nitro-enclave,chardev=char0 -kernel > >> /path/to/eif -chardev socket,id=char0,path=/tmp/vhost5.socket -m 4G > >> --enable-kvm -cpu host" > >> and then set the chardev id to the vhost-user-vsock device in the code > >> from the machine option. > >> > >> The modified "vhost-device-vsock" would need to be run with the new > >> option that will forward everything to VMADDR_CID_LOCAL (below by the > >> "-z" I mean the new option) > >> "./target/debug/vhost-device-vsock -z --vm > >> guest-cid=5,socket=/tmp/vhost5.socket,uds-path=/tmp/vm5.vsock" > >> this means the guest-cid of the nitro VM is CID 5, right? > >> > >> And the applications in the host would need to use VMADDR_CID_LOCAL > >> for communication instead of "guest-cid" (5) (assuming vsock_loopback > >> is modprobed). Let's say there are 2 applications inside the nitro VM > >> that connect to CID 3 on port 9000 and 9001. And the applications on > >> the host listen on 9000 and 9001 using VMADDR_CID_LOCAL. So, after the > >> commands above (qemu VM and vhost-device-vsock) are run, the > >> communication between the applications in the host and the > >> applications in the nitro VM on port 9000 and 9001 should just work, > >> right, without needing to run any extra socat commands or such? or > >> will the user still need to run some socat commands for all the > >> relevant ports (e.g.,9000 and 9001)? > >> > >> I am just wondering what kind of changes are needed in > >> vhost-device-vsock for forwarding packets to VMADDR_CID_LOCAL? Will > >> that be something like this: the codepath that handles > >> "/tmp/vm5.vsock", upon receiving a "connect" (from inside the nitro > >> VM) for any port to "/tmp/vm5.vsock", vhost-device-vsock will just > >> connect to the same port using AF_VSOCK using the socket system calls > >> and messages received on that port in "/tmp/vm5.vsock" will be "send" > >> to the AF_VSOCK socket? or am I not thinking right and the > >> implementation would be something different entirely (change the CID > >> from 3 to 2 (or 1?) on the packets before they are handled then socat > >> will be needed probably)? Will this work if the applications in the > >> host want to connect to applications inside the nitro VM (as opposed > >> to applications inside the nitro VM connecting to CID 3)? > >> > >> Thanks and Regards, > >> Dorjoy > > >