On Fri, Apr 26, 2024 at 12:57:08PM -0700, Sean Christopherson wrote: > On Fri, Apr 26, 2024, Michael Roth wrote: > > On Wed, Apr 24, 2024 at 05:15:40PM -0700, Sean Christopherson wrote: > > > On Sun, Apr 21, 2024, Michael Roth wrote: > > > > These commands can be used to pause servicing of guest attestation > > > > requests. This useful when updating the reported TCB or signing key with > > > > commands such as SNP_SET_CONFIG/SNP_COMMIT/SNP_VLEK_LOAD, since they may > > > > in turn require updates to userspace-supplied certificates, and if an > > > > attestation request happens to be in-flight at the time those updates > > > > are occurring there is potential for a guest to receive a certificate > > > > blob that is out of sync with the effective signing key for the > > > > attestation report. > > > > > > > > These interfaces also provide some versatility with how similar > > > > firmware/certificate update activities can be handled in the future. > > > > > > Wait, IIUC, this is using the kernel to get two userspace components to not > > > stomp over each other. Why is this the kernel's problem to solve? > > > > It's not that they are stepping on each other, but that kernel and > > userspace need to coordinate on updating 2 components whose updates need > > to be atomic from a guest perspective. Take an update to VLEK key for > > instance: > > > > 1) management gets a new VLEK endorsement key from KDS along with > > What is "management"? I assume its some userspace daemon? It could be a daemon depending on cloud provider, but the main example we have in mind is something more basic like virtee[1] being used to interactively perform an update at the command-line. E.g. you point it at the new VLEK, the new cert, and it will handle updating the certs at some known location and issuing the SNP_LOAD_VLEK command. With this interface, it can take the additional step of PAUSE'ing attestations before performing either update to keep the 2 actions in sync with the guest view. [1] https://github.com/virtee/snphost > > > associated certificate chain > > 2) management uses SNP_VLEK_LOAD to update key > > 3) management updates the certs at the path VMM will grab them > > from when the EXT_GUEST_REQUEST userspace exit is issued > > > > If an attestation request comes in after 2), but before 3), then the > > guest sees an attestation report signed with the new key, but still > > gets the old certificate. > > > > If you reverse the ordering: > > > > 1) management gets a new VLEK endorsement key from KDS along with > > associated certificate chain > > 2) management updates the certs at the path VMM will grab them > > from when the EXT_GUEST_REQUEST userspace exit is issued > > 3) management uses SNP_VLEK_LOAD to update key > > > > then an attestation request between 2) and 3) will result in the guest > > getting the new cert, but getting an attestation report signed with an old > > endorsement key. > > > > Providing a way to pause guest attestation requests prior to 2), and > > resume after 3), provides a straightforward way to make those updates > > atomic to the guest. > > Assuming "management" is a userspace component, I still don't see why this > requires kernel involvement. "management" can tell VMMs to pause attestation > without having to bounce through the kernel. It doesn't even require a push That would mean a tool like virtee above would need to issue kernel commands like SNP_LOAD_VLEK to handle key update, then implement some VMM-specific hook to pause servicing of EXT_GUEST_REQ (or whatever we end up calling it). QEMU could define events for this, and libvirt could implement them, and virtee could interact with libvirt to issue them in place of the PAUSE/RESUME approach here. But SNP libvirt support is a ways out, QEMU event mechanism for this will be a pain to use directly because you'd need some custom way to enumerate all guests, to issue them. But then maybe the provider doesn't even use QEMU and has to invent something else. Or they just decide to pause all guests before performing updates but that still a potential significant amount of downtime. > without having to bounce through the kernel. It doesn't even require a push > model, e.g. wrap/redirect the certs with a file that has a "pause" flag and a > sequence counter. We could do something like flag the certificate file itself, it does sounds less painful than the above. But what defines that spec? GHCB completely defines the current format of the certs blob, so if we wrap that in another layer we need to extend the GHCB or have something else be the authority on what that wrapper looks like and tools like virtee would need to be very selective about what VMMs it can claim to support based on what file format they support... it just seems like a significant and unecessary pain that every userspace implementation will need to go through to achieve the same basic functionality. With PAUSE/RESUME, tools like virtee can be completely VMM-agnostic, and more highly-integrated daemon-based approaches can still benefit from a common mechanism that doesn't require signficant coordination with VMM processes. For something as important and basic as updating endorsement keys while guests are running it seems worthwhile to expose this minimal level of control to userspace. -Mike