Re: How to implement message forwarding from one CID to another in vhost driver

Alexander Graf <agraf@xxxxxxxxx> · Tue, 21 May 2024 08:50:22 +0300

Howdy,

On 20.05.24 14:44, Dorjoy Chowdhury wrote:
Hey Stefano,

Thanks for the reply.

On Mon, May 20, 2024, 2:55 PM Stefano Garzarella <sgarzare@xxxxxxxxxx> wrote:
Hi Dorjoy,

On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:
Hi,

Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
emulation support in QEMU. Alexander Graf is mentoring me on this work. A v1
patch series has already been posted to the qemu-devel mailing list[2].

AWS nitro enclaves is an Amazon EC2[3] feature that allows creating isolated
execution environments, called enclaves, from Amazon EC2 instances, which are
used for processing highly sensitive data. Enclaves have no persistent storage
and no external networking. The enclave VMs are based on Firecracker microvm
and have a vhost-vsock device for communication with the parent EC2 instance
that spawned it and a Nitro Secure Module (NSM) device for cryptographic
attestation. The parent instance VM always has CID 3 while the enclave VM gets
a dynamic CID. The enclave VMs can communicate with the parent instance over
various ports to CID 3, for example, the init process inside an enclave sends a
heartbeat to port 9000 upon boot, expecting a heartbeat reply, letting the
parent instance know that the enclave VM has successfully booted.

The plan is to eventually make the nitro enclave emulation in QEMU standalone
i.e., without needing to run another VM with CID 3 with proper vsock
If you don't have to launch another VM, maybe we can avoid vhost-vsock
and emulate virtio-vsock in user-space, having complete control over the
behavior.

So we could use this opportunity to implement virtio-vsock in QEMU [4]
or use vhost-user-vsock [5] and customize it somehow.
(Note: vhost-user-vsock already supports sibling communication, so maybe
with a few modifications it fits your case perfectly)

[4] https://gitlab.com/qemu-project/qemu/-/issues/2095
[5] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock

Thanks for letting me know. Right now I don't have a complete picture
but I will look into them. Thank you.

communication support. For this to work, one approach could be to teach the
vhost driver in kernel to forward CID 3 messages to another CID N
So in this case both CID 3 and N would be assigned to the same QEMU
process?

CID N is assigned to the enclave VM. CID 3 was supposed to be the
parent VM that spawns the enclave VM (this is how it is in AWS, where
an EC2 instance VM spawns the enclave VM from inside it and that
parent EC2 instance always has CID 3). But in the QEMU case as we
don't want a parent VM (we want to run enclave VMs standalone) we
would need to forward the CID 3 messages to host CID. I don't know if
it means CID 3 and CID N is assigned to the same QEMU process. Sorry.

There are 2 use cases here:

1) Enclave wants to treat host as parent (default). In this scenario, 
the "parent instance" that shows up as CID 3 in the Enclave doesn't 
really exist. Instead, when the Enclave attempts to talk to CID 3, it 
should really land on CID 0 (hypervisor). When the hypervisor tries to 
connect to the Enclave on port X, it should look as if it originates 
from CID 3, not CID 0.

2) Multiple parent VMs. Think of an actual cloud hosting scenario. Here, 
we have multiple "parent instances". Each of them thinks it's CID 3. 
Each can spawn an Enclave that talks to CID 3 and reach the parent. For 
this case, I think implementing all of virtio-vsock in user space is the 
best path forward. But in theory, you could also swizzle CIDs to make 
random "real" CIDs appear as CID 3.

Do you have to allocate 2 separate virtio-vsock devices, one for the
parent and one for the enclave?

If there is a parent VM, then I guess both parent and enclave VMs need
virtio-vsock devices.

(set to CID 2 for host) i.e., it patches CID from 3 to N on incoming messages
and from N to 3 on responses. This will enable users of the
Will these messages have the VMADDR_FLAG_TO_HOST flag set?

We don't support this in vhost-vsock yet, if supporting it helps, we
might, but we need to better understand how to avoid security issues, so
maybe each device needs to explicitly enable the feature and specify
from which CIDs it accepts packets.

I don't know about the flag. So I don't know if it will be set. Sorry.

From the guest's point of view, the parent (CID 3) is just another VM. 
Since Linux as of

 https://patchwork.ozlabs.org/project/netdev/patch/20201204170235.84387-4-andraprs@xxxxxxxxxx/#2594117

always sets VMADDR_FLAG_TO_HOST when local_CID > 0 && remote_CID > 0, I 
would say the message has the flag set.

How would you envision the host to implement the flag? Would the host 
allow user space to listen on any CID and hence receive the respective 
target connections? And wouldn't listening on CID 0 then mean you're 
effectively listening to "any" other CID? Thinking about that a bit 
more, that may be just what we need, yes :)

nitro-enclave machine
type in QEMU to run the necessary vsock server/clients in the host machine
(some defaults can be implemented in QEMU as well, for example, sending a reply
to the heartbeat) which will rid them of the cumbersome way of running another
whole VM with CID 3. This way, users of nitro-enclave machine in QEMU, could
potentially also run multiple enclaves with their messages for CID 3 forwarded
to different CIDs which, in QEMU side, could then be specified using a new
machine type option (parent-cid) if implemented. I guess in the QEMU side, this
will be an ioctl call (or some other way) to indicate to the host kernel that
the CID 3 messages need to be forwarded. Does this approach of
What if there is already a VM with CID = 3 in the system?

Good question! I don't know what should happen in this case.

See case 2 above :). In a nutshell, I don't think it'd be legal to have 
a real CID 3 in that scenario.

forwarding CID 3 messages to another CID sound good?
It seems too specific a case, if we can generalize it maybe we could
make this change, but we would like to avoid complicating vhost-vsock
and keep it as simple as possible to avoid then having to implement
firewalls, etc.

So first I would see if vhost-user-vsock or the QEMU built-in device is
right for this use-case.
Thanks you! I will check everything out and reach out if I need
further guidance about what needs to be done. And sorry as I wasn't
able to answer some of your questions.

As mentioned above, I think there is merit for both. I personally care a 
lot more for case 1 over case 2: We already have a working 
implementation of Nitro Enclaves in a Cloud setup. What is missing is a 
way to easily run a Nitro Enclave locally for development.

Alex