Re: How to implement message forwarding from one CID to another in vhost driver

Alexander Graf <graf@xxxxxxxxxx> · Mon, 27 May 2024 09:08:00 +0200

Hey Stefano,

On 23.05.24 10:45, Stefano Garzarella wrote:
On Tue, May 21, 2024 at 08:50:22AM GMT, Alexander Graf wrote:
Howdy,

On 20.05.24 14:44, Dorjoy Chowdhury wrote:
Hey Stefano,

Thanks for the reply.

On Mon, May 20, 2024, 2:55 PM Stefano Garzarella 
<sgarzare@xxxxxxxxxx> wrote:
Hi Dorjoy,

On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:
Hi,

Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
emulation support in QEMU. Alexander Graf is mentoring me on this 
work. A v1
patch series has already been posted to the qemu-devel mailing 
list[2].

AWS nitro enclaves is an Amazon EC2[3] feature that allows 
creating isolated
execution environments, called enclaves, from Amazon EC2 
instances, which are
used for processing highly sensitive data. Enclaves have no 
persistent storage
and no external networking. The enclave VMs are based on 
Firecracker microvm
and have a vhost-vsock device for communication with the parent 
EC2 instance
that spawned it and a Nitro Secure Module (NSM) device for 
cryptographic
attestation. The parent instance VM always has CID 3 while the 
enclave VM gets
a dynamic CID. The enclave VMs can communicate with the parent 
instance over
various ports to CID 3, for example, the init process inside an 
enclave sends a
heartbeat to port 9000 upon boot, expecting a heartbeat reply, 
letting the
parent instance know that the enclave VM has successfully booted.

The plan is to eventually make the nitro enclave emulation in QEMU 
standalone
i.e., without needing to run another VM with CID 3 with proper vsock
If you don't have to launch another VM, maybe we can avoid vhost-vsock
and emulate virtio-vsock in user-space, having complete control 
over the
behavior.

So we could use this opportunity to implement virtio-vsock in QEMU [4]
or use vhost-user-vsock [5] and customize it somehow.
(Note: vhost-user-vsock already supports sibling communication, so 
maybe
with a few modifications it fits your case perfectly)

[4] https://gitlab.com/qemu-project/qemu/-/issues/2095
[5] 
https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock

Thanks for letting me know. Right now I don't have a complete picture
but I will look into them. Thank you.

communication support. For this to work, one approach could be to 
teach the
vhost driver in kernel to forward CID 3 messages to another CID N
So in this case both CID 3 and N would be assigned to the same QEMU
process?

CID N is assigned to the enclave VM. CID 3 was supposed to be the
parent VM that spawns the enclave VM (this is how it is in AWS, where
an EC2 instance VM spawns the enclave VM from inside it and that
parent EC2 instance always has CID 3). But in the QEMU case as we
don't want a parent VM (we want to run enclave VMs standalone) we
would need to forward the CID 3 messages to host CID. I don't know if
it means CID 3 and CID N is assigned to the same QEMU process. Sorry.

There are 2 use cases here:

1) Enclave wants to treat host as parent (default). In this scenario,
the "parent instance" that shows up as CID 3 in the Enclave doesn't
really exist. Instead, when the Enclave attempts to talk to CID 3, it
should really land on CID 0 (hypervisor). When the hypervisor tries to
connect to the Enclave on port X, it should look as if it originates
from CID 3, not CID 0.

2) Multiple parent VMs. Think of an actual cloud hosting scenario.
Here, we have multiple "parent instances". Each of them thinks it's
CID 3. Each can spawn an Enclave that talks to CID 3 and reach the
parent. For this case, I think implementing all of virtio-vsock in
user space is the best path forward. But in theory, you could also
swizzle CIDs to make random "real" CIDs appear as CID 3.

Thank you for clarifying the use cases!

Also for case 1, vhost-vsock doesn't support CID 0, so in my opinion
it's easier to go into user-space with vhost-user-vsock or the built-in
device.

Sorry, I believe I meant CID 2. Effectively for case 1, when a process 
on the hypervisor listens on port 1234, it should be visible as 3:1234 
from the VM and when the hypervisor process connects to <VM CID>:1234, 
it should look as if that connection came from CID 3.

Maybe initially with vhost-user-vsock it's easier because we already
have some thing that works and supports sibling communication (for case
2).

The problem with vhost-user-vsock is that you don't get to use AF_VSOCK 
as a host process.

A typical Nitro Enclaves application is split into 2 parts: An 
in-Enclave component that listens/connects to vsock and a parent process 
that listens/connects to vsock. The experience of launching an Enclave 
is very similar to launching a QEMU VM: You run nitro-cli and tell it to 
pop up the Enclave based on an EIF file. Nitro-cli then tells you the 
CID that was allocated for the Enclave and you communicate to it using that.

What I would ideally like to have as development experience is that you 
run QEMU with unmodified Enclave components (the EIF file) and run your 
parent application unmodified on the host.

For that to work, the host applications needs to be able to use AF_VSOCK.

I agree that for this conversation, we should just ignore case 2 and 
consider it as "solved" through vhost-user-vsock, as that can create its 
own CID namespace between different VMs.

Do you have to allocate 2 separate virtio-vsock devices, one for the
parent and one for the enclave?

If there is a parent VM, then I guess both parent and enclave VMs need
virtio-vsock devices.

(set to CID 2 for host) i.e., it patches CID from 3 to N on 
incoming messages
and from N to 3 on responses. This will enable users of the
Will these messages have the VMADDR_FLAG_TO_HOST flag set?

We don't support this in vhost-vsock yet, if supporting it helps, we
might, but we need to better understand how to avoid security 
issues, so
maybe each device needs to explicitly enable the feature and specify
from which CIDs it accepts packets.

I don't know about the flag. So I don't know if it will be set. Sorry.

From the guest's point of view, the parent (CID 3) is just another VM.
Since Linux as of

 https://patchwork.ozlabs.org/project/netdev/patch/20201204170235.84387-4-andraprs@xxxxxxxxxx/#2594117 

always sets VMADDR_FLAG_TO_HOST when local_CID > 0 && remote_CID > 0, I
would say the message has the flag set.

How would you envision the host to implement the flag? Would the host
allow user space to listen on any CID and hence receive the respective
target connections? And wouldn't listening on CID 0 then mean you're
effectively listening to "any" other CID? Thinking about that a bit
more, that may be just what we need, yes :)

No, wait. The flag I had guessed only to implement sibling
communication, so the host doesn't re-forward those packets to sockets
opened by applications in the host, but only to other VMs in the same
host. So the host would always only have CID 2 assigned (CID 0 is not
supported by vhost-vsock).

nitro-enclave machine
type in QEMU to run the necessary vsock server/clients in the host 
machine
(some defaults can be implemented in QEMU as well, for example, 
sending a reply
to the heartbeat) which will rid them of the cumbersome way of 
running another
whole VM with CID 3. This way, users of nitro-enclave machine in 
QEMU, could
potentially also run multiple enclaves with their messages for CID 
3 forwarded
to different CIDs which, in QEMU side, could then be specified 
using a new
machine type option (parent-cid) if implemented. I guess in the 
QEMU side, this
will be an ioctl call (or some other way) to indicate to the host 
kernel that
the CID 3 messages need to be forwarded. Does this approach of
What if there is already a VM with CID = 3 in the system?

Good question! I don't know what should happen in this case.

See case 2 above :). In a nutshell, I don't think it'd be legal to
have a real CID 3 in that scenario.

Yeah, with vhost-vsock we can't, but with vhost-user-vsock I think is
fine since the guest CID is local for each instance. The host only sees
the unix socket (like with firecracker).

See above why a unix socket is not really great CX :)

forwarding CID 3 messages to another CID sound good?
It seems too specific a case, if we can generalize it maybe we could
make this change, but we would like to avoid complicating vhost-vsock
and keep it as simple as possible to avoid then having to implement
firewalls, etc.

So first I would see if vhost-user-vsock or the QEMU built-in 
device is
right for this use-case.
Thanks you! I will check everything out and reach out if I need
further guidance about what needs to be done. And sorry as I wasn't
able to answer some of your questions.

As mentioned above, I think there is merit for both. I personally care
a lot more for case 1 over case 2: We already have a working
implementation of Nitro Enclaves in a Cloud setup. What is missing is
a way to easily run a Nitro Enclave locally for development.

If both are fine, then I would go more on modifying vhost-user-vsock or
adding a built-in device in QEMU.
We have more freedom and also easier to update/debug.

I agree on those points, but if we go down that route users can't simply 
reuse their existing code, no? At that point, they're probably better 
off just spawning another (micro)-VM on CID 3, as that at least gives 
them the ability to reuse their existing parent code.

Alex

Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597