Re: How to implement message forwarding from one CID to another in vhost driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 28, 2024 at 05:19:34PM GMT, Paolo Bonzini wrote:
On 5/27/24 09:54, Alexander Graf wrote:

On 27.05.24 09:08, Alexander Graf wrote:
Hey Stefano,

On 23.05.24 10:45, Stefano Garzarella wrote:
On Tue, May 21, 2024 at 08:50:22AM GMT, Alexander Graf wrote:
Howdy,

On 20.05.24 14:44, Dorjoy Chowdhury wrote:
Hey Stefano,

Thanks for the reply.


On Mon, May 20, 2024, 2:55 PM Stefano Garzarella <sgarzare@xxxxxxxxxx> wrote:
Hi Dorjoy,

On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:
Hi,

Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
emulation support in QEMU. Alexander Graf is mentoring me on this work. A v1 patch series has already been posted to the qemu-devel mailing list[2].

AWS nitro enclaves is an Amazon EC2[3] feature that allows creating isolated execution environments, called enclaves, from Amazon EC2 instances, which are used for processing highly sensitive data. Enclaves have no persistent storage and no external networking. The enclave VMs are based on Firecracker microvm and have a vhost-vsock device for communication with the parent EC2 instance that spawned it and a Nitro Secure Module (NSM) device for cryptographic attestation. The parent instance VM always has CID 3 while the enclave VM gets a dynamic CID. The enclave VMs can communicate with the parent instance over various ports to CID 3, for example, the init process inside an enclave sends a heartbeat to port 9000 upon boot, expecting a heartbeat reply, letting the
parent instance know that the enclave VM has successfully booted.

The plan is to eventually make the nitro enclave emulation in QEMU standalone
i.e., without needing to run another VM with CID 3 with proper vsock
If you don't have to launch another VM, maybe we can avoid vhost-vsock and emulate virtio-vsock in user-space, having complete control over the
behavior.

So we could use this opportunity to implement virtio-vsock in QEMU [4]
or use vhost-user-vsock [5] and customize it somehow.
(Note: vhost-user-vsock already supports sibling communication, so maybe
with a few modifications it fits your case perfectly)

[4] https://gitlab.com/qemu-project/qemu/-/issues/2095
[5] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock


Thanks for letting me know. Right now I don't have a complete picture
but I will look into them. Thank you.


communication support. For this to work, one approach could be to teach the
vhost driver in kernel to forward CID 3 messages to another CID N
So in this case both CID 3 and N would be assigned to the same QEMU
process?


CID N is assigned to the enclave VM. CID 3 was supposed to be the
parent VM that spawns the enclave VM (this is how it is in AWS, where
an EC2 instance VM spawns the enclave VM from inside it and that
parent EC2 instance always has CID 3). But in the QEMU case as we
don't want a parent VM (we want to run enclave VMs standalone) we
would need to forward the CID 3 messages to host CID. I don't know if
it means CID 3 and CID N is assigned to the same QEMU process. Sorry.


There are 2 use cases here:

1) Enclave wants to treat host as parent (default). In this scenario,
the "parent instance" that shows up as CID 3 in the Enclave doesn't
really exist. Instead, when the Enclave attempts to talk to CID 3, it
should really land on CID 0 (hypervisor). When the hypervisor tries to
connect to the Enclave on port X, it should look as if it originates
from CID 3, not CID 0.

2) Multiple parent VMs. Think of an actual cloud hosting scenario.
Here, we have multiple "parent instances". Each of them thinks it's
CID 3. Each can spawn an Enclave that talks to CID 3 and reach the
parent. For this case, I think implementing all of virtio-vsock in
user space is the best path forward. But in theory, you could also
swizzle CIDs to make random "real" CIDs appear as CID 3.


Thank you for clarifying the use cases!

Also for case 1, vhost-vsock doesn't support CID 0, so in my opinion
it's easier to go into user-space with vhost-user-vsock or the built-in
device.


Sorry, I believe I meant CID 2. Effectively for case 1, when a process on the hypervisor listens on port 1234, it should be visible as 3:1234 from the VM and when the hypervisor process connects to <VM CID>:1234, it should look as if that connection came from CID 3.


Now that I'm thinking about my message again: What if we just introduce a sysfs/sysctl file for vsock that indicates the "host CID" (default: 2)? Users that want vhost-vsock to behave as if the host is CID 3 can just write 3 to it.

It means we'd need to change all references to VMADDR_CID_HOST to instead refer to a global variable that indicates the new "host CID". It'd need some more careful massaging to not break number namespace assumptions (<= CID_HOST no longer works), but the idea should fly.

Forwarding one or more ports of a given CID to CID 2 (the host) should be doable with a dummy vhost client that listens to CID 3, connects to CID 2 and send data back and forth.

Good idea, a kind of socat but that can handle /dev/vhost-vsock. With rust-vmm crates it should be doable, but I think we always need to extend vhost-vsock to support VMADDR_FLAG_TO_HOST, because for now it does not allow guests to send packets to the host with destinatation other than 2.

Not hard enough to justify changing all references to VMADDR_CID_HOST

I agree.

(and also I am not sure if vsock supports network namespaces?

nope, I had been working on it, but I could never finish it :-(
Tracking the work here: https://gitlab.com/vsock/vsock/-/issues/2

then the sysctl/sysfs way is not feasible because you cannot set it per-netns, can you?). It also has the disadvantages that different QEMU instances are not insulated.

I think it's either that or implementing virtio-vsock in userspace (https://lore.kernel.org/qemu-devel/30baeb56-64d2-4ea3-8e53-6a5c50999979@xxxxxxxxxx/, search for "To connect host<->guest").

For in this case AF_VSOCK can't be used in the host, right?
So it's similar to vhost-user-vsock.

Thanks,
Stefano





[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux