Extending management apps using libvirt to support measured launch of QEMU guests with SEV/SEV-ES is unreasonably complicated today, both for the guest owner and for the cloud management apps. We have APIs for exposing info about the SEV host, the SEV guest, guest measurements and secret injections. This is a "bags of bits" solution. We expect apps to them turn this into a user facting solution. It is possible but we're heading to a place where every cloud mgmt app essentially needs to reinvent the same wheel and the guest owner will need to learn custom APIs for dealing with SEV/SEV-ES for each cloud mgmt app. This is pretty awful. We need to do a better job at providing a solution that is more general purpose IMHO. Consider a cloud mgmt app, right now the flow to use the bag of bits libvirt exposes, looks something like * Guest owner tells mgmt app they want to launch a VM * Mgmt app decides what host the VM will be launched on * Guest owner requests cert chain for the virt host from mgmt app * Guest owner validates cert chain for the virt host * Guest owner generates launch blob for the VM * Guest owner provides launch blob to the mgmt app * Management app tells libvirt to launch VM with blob, with CPUs in a paused state * Libvirt luanches QEMU with CPUs stopped * Guest owner requests launch measurement from mgmt app * Guest owner validates measurement * Guest owner generates secret blob * Guest owner sends secret blob to management app * Management app tells libvirt to inject secrets * Libvirt injects secrets to QEMU * Management app tells libvirt to start QEMU CPUs * Libvirt tells QEMU to start CPUs Compare to a non-confidental VM * Guest owner tells mgmt app they want to launch a VM * Mgmt app decides what host the VM will be launched on * Mgmt app tells libvirt to launch VM with CPUs in running state * Libvirt launches QEMU with CPUs running Now, of course the guest owner wouldn't be manually performing the earlier steps, they would want some kind of software to take care of this. No matter what, it still involves a large number of back and forth operations between the guest owner & mgmt app, and between the mgmt app and libvirt. One of libvirt's key jobs is to isolate mgmt apps from differences in behaviour of underlying hypervisor technologies, and we're failing at that job with SEV/SEV-ES, because the mgmt app needs to go through a multi-stage dance on every VM start, that is different from what they do with non-confidential VMs. It is especially unpleasant because there needs to be a "wait state" between when the app selects a host to deploy a VM on, and when it can actually start a VM. In essence the app needs to reserve capacity on a host ahead of time for a VM that will be created some arbitrary time later. This can have significant implications for the mgmt app architectural design that are not neccessarily easy to address, when they expect to just call virDomainCreate have the VM running in one step. It also harms interoperability to libvirt tools. For example if a mgmt tool like virt-manager/OpenStack created a VM using SEV, and you want to start it manually using a different tool like 'virsh', you enter a world of complexity and pain, due to the multi step dance required. AFAICT, in all of this, the mgmt app is really acting as a conduit and is not implementing any interesting logic. The clever stuff is all the responsibility of the guest owner, and/or whatever software for attestation they are using remotely. I think there is scope for enhancing libvirt, such that usage of SEV/SEV-ES has little-to-no burden for the management apps, and much less burden for guest owners. The key to achieving this is to define a protocol for libvirt to connect to a remote service to handle the launch measurements & secret acquisition. The guest owner can provide the address of a service they control (or trust), and libvirt can take care of all the interactions with it. This frees both the user and mgmt app from having to know much about SEV/SEV-ES, with VM startup process being essentially the same as it has always been. The sequence would look like * Guest owner tells attestation service they intend to create a VM with a given UUID, policy, and any other criteria such as cert of the cloud owner, valid OVMF firmware hashes, and providing any needed LUKS keys. * Guest owner tells mgmt app they want to launch a VM, using attestation service at https://somehost/and/url * Mgmt app decides what host the VM will be launched on * Mgmt app tells libvirt to launch VM with CPUs in running state The next steps involve solely libvirt & the attestation service. The mgmt app and guest owner have done their work. * Libvirt contacts the service providing certificate chain for the host to be used, the UUID of the guest, and any other required info about the host. * Attestation service validates the cert chain to ensure it belongs to the cloud owner that was identified previously * Attestation service generates a launch blob and puts it in the response back to libvirt * Libvirt launches QEMU with CPUs paused * Libvirt gets the launch measurement and sends it to the attestation server, with any other required info about the VM instance * Attestation service validates the measurement * Attestation builds the secret table with LUKS keys and puts it in the response back to libvirt * Libvirt injects the secret table to QEMU * Libvirt tells QEMU to start CPUs All the same exchanges of information are present, but the management app doesn't have to get involved. The guest owner also doesn't have to get involved except for a one-time setup step. The software the guest owner uses for attestation also doesn't have to be written to cope with talking to OpenStack, CNV and whatever other vendor specific cloud mgmt apps exist today. This will significantly reduce the burden if supporting SEV/SEV-ES launch measurement in libvirt based apps, and make SEV/SEV-ES guests more "normal" from a mgmt POV. What could this look like from POV of an attestation server API, if we assume HTTPS REST service with a simple JSON payload ... * Guest Owner: Register a new VM to be booted: POST /vm/<UUID> Request body: { "scheme": "amd-sev", "cloud-cert": "certificate of the cloud owner that signs the PEK", "policy": 0x3, "cpu-count": 3, "firmware-hashes": [ "xxxx", "yyyy", ], "kernel-hash": "aaaa", "initrd-hash": "bbbb", "cmdline-hash": "cccc", "secrets": [ { "type": "luks-passphrase", "passphrase": "<blah>" } ] } * Libvirt: Request permission to launch a VM on a host POST /vm/<UUID>/launch Request body: { "pdh": "<blah>", "cert-chain": "<blah>", "cpu-id": "<CPU ID>", ...other relevant bits... } Service decides if the proposed host is acceptable Response body (on success) { "session": "<blah>", "owner-cert": "<blah>", "policy": 3, } * Libvirt: Request secrets to inject to launched VM POST /vm/<UUID>/validate Request body: { "api-minor": 1, "api-major": 2, "build-id": 241, "policy": 3, "measurement": "<blah>", "firmware-hash": "xxxx", "cpu-count": 3, ....other relevant stuff.... } Service validates the measurement... Response body (on success): { "secret-header": "<blah>", "secret-table": "<blah>", } So we can see there are only a couple of REST API calls we need to be able to define. If we could do that then creating a SEV/SEV-ES enabled guest with libvirt would not involve anything more complicated for the mgmt app that providing the URI of the guest owner's attestation service and an identifier for the VM. ie. the XML config could be merely: <launchSecurity type="sev"> <attestation vmid="57f669c2-c427-4132-bc7a-26f56b6a718c" service="http://somehost/some/url"/> </launchSecurity> And then involve virDomainCreate as normal with any other libvirt / QEMU guest. No special workflow is required by the mgmt app. There is a small extra task for the guest owner to register existance of their VM with the attestation service. Aside from that the only change to the way they interact with the cloud mgmt app is to provide the VM ID and URI for the attestation service. No need to learn custom APIs for each different cloud vendor, for dealing with fetching launch measurements or injecting secrets. Finally this attestation service REST protocol doesn't have to be something controlled or defined by libvirt. I feel like it could be a protocol that is defined anywhere and libvirt merely be one consumer of it. Other apps that directly use QEMU may also wish to avail themselves of it. All that really matters from libvirt POV is: - The protocol definition exist to enable the above workflow, with a long term API stability guarantee that it isn't going to changed in incompatible ways - There exists a fully open source reference implementation of sufficient quality to deploy in the real world I know https://github.com/slp/sev-attestation-server exists, but its current design has assumptions about it being used with libkrun AFAICT. I have heard of others interested in writing similar servers, but I've not seen code. We are at a crucial stage where mgmt apps are looking to support measured boot with SEV/SEV-ES and if we delay they'll all go off and do their own thing, and it'll be too late, leading to https://xkcd.com/927/. Especially for apps using libvirt to manage QEMU, I feel we have got a few months window of opportunity to get such a service available, before they all end up building out APIs for the tedious manual workflow, reinventing the wheel. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|