Re: How to implement message forwarding from one CID to another in vhost driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Stefano,

On 23.05.24 10:45, Stefano Garzarella wrote:
On Tue, May 21, 2024 at 08:50:22AM GMT, Alexander Graf wrote:
Howdy,

On 20.05.24 14:44, Dorjoy Chowdhury wrote:
Hey Stefano,

Thanks for the reply.


On Mon, May 20, 2024, 2:55 PM Stefano Garzarella <sgarzare@xxxxxxxxxx> wrote:
Hi Dorjoy,

On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:
Hi,

Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
emulation support in QEMU. Alexander Graf is mentoring me on this work. A v1 patch series has already been posted to the qemu-devel mailing list[2].

AWS nitro enclaves is an Amazon EC2[3] feature that allows creating isolated execution environments, called enclaves, from Amazon EC2 instances, which are used for processing highly sensitive data. Enclaves have no persistent storage and no external networking. The enclave VMs are based on Firecracker microvm and have a vhost-vsock device for communication with the parent EC2 instance that spawned it and a Nitro Secure Module (NSM) device for cryptographic attestation. The parent instance VM always has CID 3 while the enclave VM gets a dynamic CID. The enclave VMs can communicate with the parent instance over various ports to CID 3, for example, the init process inside an enclave sends a heartbeat to port 9000 upon boot, expecting a heartbeat reply, letting the
parent instance know that the enclave VM has successfully booted.

The plan is to eventually make the nitro enclave emulation in QEMU standalone
i.e., without needing to run another VM with CID 3 with proper vsock
If you don't have to launch another VM, maybe we can avoid vhost-vsock
and emulate virtio-vsock in user-space, having complete control over the
behavior.

So we could use this opportunity to implement virtio-vsock in QEMU [4]
or use vhost-user-vsock [5] and customize it somehow.
(Note: vhost-user-vsock already supports sibling communication, so maybe
with a few modifications it fits your case perfectly)

[4] https://gitlab.com/qemu-project/qemu/-/issues/2095
[5] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock


Thanks for letting me know. Right now I don't have a complete picture
but I will look into them. Thank you.


communication support. For this to work, one approach could be to teach the
vhost driver in kernel to forward CID 3 messages to another CID N
So in this case both CID 3 and N would be assigned to the same QEMU
process?


CID N is assigned to the enclave VM. CID 3 was supposed to be the
parent VM that spawns the enclave VM (this is how it is in AWS, where
an EC2 instance VM spawns the enclave VM from inside it and that
parent EC2 instance always has CID 3). But in the QEMU case as we
don't want a parent VM (we want to run enclave VMs standalone) we
would need to forward the CID 3 messages to host CID. I don't know if
it means CID 3 and CID N is assigned to the same QEMU process. Sorry.


There are 2 use cases here:

1) Enclave wants to treat host as parent (default). In this scenario,
the "parent instance" that shows up as CID 3 in the Enclave doesn't
really exist. Instead, when the Enclave attempts to talk to CID 3, it
should really land on CID 0 (hypervisor). When the hypervisor tries to
connect to the Enclave on port X, it should look as if it originates
from CID 3, not CID 0.

2) Multiple parent VMs. Think of an actual cloud hosting scenario.
Here, we have multiple "parent instances". Each of them thinks it's
CID 3. Each can spawn an Enclave that talks to CID 3 and reach the
parent. For this case, I think implementing all of virtio-vsock in
user space is the best path forward. But in theory, you could also
swizzle CIDs to make random "real" CIDs appear as CID 3.


Thank you for clarifying the use cases!

Also for case 1, vhost-vsock doesn't support CID 0, so in my opinion
it's easier to go into user-space with vhost-user-vsock or the built-in
device.


Sorry, I believe I meant CID 2. Effectively for case 1, when a process on the hypervisor listens on port 1234, it should be visible as 3:1234 from the VM and when the hypervisor process connects to <VM CID>:1234, it should look as if that connection came from CID 3.


Maybe initially with vhost-user-vsock it's easier because we already
have some thing that works and supports sibling communication (for case
2).


The problem with vhost-user-vsock is that you don't get to use AF_VSOCK as a host process.

A typical Nitro Enclaves application is split into 2 parts: An in-Enclave component that listens/connects to vsock and a parent process that listens/connects to vsock. The experience of launching an Enclave is very similar to launching a QEMU VM: You run nitro-cli and tell it to pop up the Enclave based on an EIF file. Nitro-cli then tells you the CID that was allocated for the Enclave and you communicate to it using that.

What I would ideally like to have as development experience is that you run QEMU with unmodified Enclave components (the EIF file) and run your parent application unmodified on the host.

For that to work, the host applications needs to be able to use AF_VSOCK.


I agree that for this conversation, we should just ignore case 2 and consider it as "solved" through vhost-user-vsock, as that can create its own CID namespace between different VMs.





Do you have to allocate 2 separate virtio-vsock devices, one for the
parent and one for the enclave?


If there is a parent VM, then I guess both parent and enclave VMs need
virtio-vsock devices.

(set to CID 2 for host) i.e., it patches CID from 3 to N on incoming messages
and from N to 3 on responses. This will enable users of the
Will these messages have the VMADDR_FLAG_TO_HOST flag set?

We don't support this in vhost-vsock yet, if supporting it helps, we
might, but we need to better understand how to avoid security issues, so
maybe each device needs to explicitly enable the feature and specify
from which CIDs it accepts packets.


I don't know about the flag. So I don't know if it will be set. Sorry.


From the guest's point of view, the parent (CID 3) is just another VM.
Since Linux as of

 https://patchwork.ozlabs.org/project/netdev/patch/20201204170235.84387-4-andraprs@xxxxxxxxxx/#2594117

always sets VMADDR_FLAG_TO_HOST when local_CID > 0 && remote_CID > 0, I
would say the message has the flag set.

How would you envision the host to implement the flag? Would the host
allow user space to listen on any CID and hence receive the respective
target connections? And wouldn't listening on CID 0 then mean you're
effectively listening to "any" other CID? Thinking about that a bit
more, that may be just what we need, yes :)

No, wait. The flag I had guessed only to implement sibling
communication, so the host doesn't re-forward those packets to sockets
opened by applications in the host, but only to other VMs in the same
host. So the host would always only have CID 2 assigned (CID 0 is not
supported by vhost-vsock).





nitro-enclave machine
type in QEMU to run the necessary vsock server/clients in the host machine (some defaults can be implemented in QEMU as well, for example, sending a reply to the heartbeat) which will rid them of the cumbersome way of running another whole VM with CID 3. This way, users of nitro-enclave machine in QEMU, could potentially also run multiple enclaves with their messages for CID 3 forwarded to different CIDs which, in QEMU side, could then be specified using a new machine type option (parent-cid) if implemented. I guess in the QEMU side, this will be an ioctl call (or some other way) to indicate to the host kernel that
the CID 3 messages need to be forwarded. Does this approach of
What if there is already a VM with CID = 3 in the system?


Good question! I don't know what should happen in this case.


See case 2 above :). In a nutshell, I don't think it'd be legal to
have a real CID 3 in that scenario.

Yeah, with vhost-vsock we can't, but with vhost-user-vsock I think is
fine since the guest CID is local for each instance. The host only sees
the unix socket (like with firecracker).


See above why a unix socket is not really great CX :)







forwarding CID 3 messages to another CID sound good?
It seems too specific a case, if we can generalize it maybe we could
make this change, but we would like to avoid complicating vhost-vsock
and keep it as simple as possible to avoid then having to implement
firewalls, etc.

So first I would see if vhost-user-vsock or the QEMU built-in device is
right for this use-case.
Thanks you! I will check everything out and reach out if I need
further guidance about what needs to be done. And sorry as I wasn't
able to answer some of your questions.


As mentioned above, I think there is merit for both. I personally care
a lot more for case 1 over case 2: We already have a working
implementation of Nitro Enclaves in a Cloud setup. What is missing is
a way to easily run a Nitro Enclave locally for development.

If both are fine, then I would go more on modifying vhost-user-vsock or
adding a built-in device in QEMU.
We have more freedom and also easier to update/debug.


I agree on those points, but if we go down that route users can't simply reuse their existing code, no? At that point, they're probably better off just spawning another (micro)-VM on CID 3, as that at least gives them the ability to reuse their existing parent code.


Alex





Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597




[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux