On Fri, Nov 01, 2024 at 01:53:26PM -0700, Dionna Amalie Glaze wrote: > On Mon, Oct 28, 2024 at 11:20 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > On Fri, Sep 13, 2024, Dionna Amalie Glaze wrote: > > > We can extend the ccp driver to, on extended guest request, lock the > > > command buffer, get the REPORTED_TCB, complete the request, unlock the > > > command buffer, and return both the response and the REPORTED_TCB at > > > the time of the request. > > > > Holding a lock across an exit to userspace seems wildly unsafe. > > I wasn't suggesting this. I was suggesting adding a special ccp symbol > that would perform two sev commands under the same lock to ensure we > know the REPORTED_TCB that was used to derive the VCEK that signs an > attestation report in the MSG_REPORT_REQ guest request. We use that > atomicity to be sure that when we exit to user space to request > certificates that we're getting the right version certificates. > > > > > Can you explain the race that you are trying to close, with the exact "bad" sequence > > of events laid out in chronological order, and an explanation of why the race can't > > be sovled in userspace? I read through your previous comment[*] (which I assume > > is the race you want to close?), but I couldn't quite piece together exactly what's > > broken. Hi Dionna, > > 1. the control plane delivers a firmware update. Current TCB version > goes up. The machine signals that it needs new certificates before it > can commit. > 2. VM performs an extended guest request. > 3. KVM exits to user space to get certificates before getting the > report from firmware. > 4. [what I understand Michael Roth was suggesting] User space grabs a > file lock to see if it can read the cached certificates. It reads the > certificates and releases the lock before returning to KVM. > 5. the control plane delivers the certificates to the machine and > tells it to commit. The machine grabs the certificate file lock, runs > SNP_COMMIT, and releases the file lock. This command updates both > COMMITTED_TCB and REPORTED_TCB. > 6. KVM asks firmware to complete the MSG_REPORT_REQ request, but it's > a different REPORTED_TCB. > 7. Guest receives the wrong certificates for certifying the report it > just received. > > The fact that 4 has to release the lock before getting the attestation > report is the problem. We wouldn't actually release the lock before getting the attestation report. There's more specifics on the suggested flow in the documentation update accompanying this patch: + NOTE: In the case of SEV-SNP, the endorsement key used by firmware may + change as a result of management activities like updating SEV-SNP firmware + or loading new endorsement keys, so some care should be taken to keep the + returned certificate data in sync with the actual endorsement key in use by + firmware at the time the attestation request is sent to SNP firmware. The + recommended scheme to do this is: + + - The VMM should obtain a shared or exclusive lock on the path the + certificate blob file resides at before reading it and returning it to + KVM, and continue to hold the lock until the attestation request is + actually sent to firmware. To facilitate this, the VMM can set the + ``immediate_exit`` flag of kvm_run just after supplying the certificate + data, and just before and resuming the vCPU. This will ensure the vCPU + will exit again to userspace with ``-EINTR`` after it finishes fetching + the attestation request from firmware, at which point the VMM can + safely drop the file lock. + + - Tools/libraries that perform updates to SNP firmware TCB values or + endorsement keys (e.g. via /dev/sev interfaces such as ``SNP_COMMIT``, + ``SNP_SET_CONFIG``, or ``SNP_VLEK_LOAD``, see + Documentation/virt/coco/sev-guest.rst for more details) in such a way + that the certificate blob needs to be updated, should similarly take an + exclusive lock on the certificate blob for the duration of any updates + to endorsement keys or the certificate blob contents to ensure that + VMMs using the above scheme will not return certificate blob data that + is out of sync with the endorsement key used by firmware. So #5 would not be able to obtain an exclusive file lock until userspace receives confirmation that the attestation request was processed by firmware. At that point it will be an accurate reflection of the attestation state associated with that particular version of the certificates that was fetched from userspace. So at that point the, transaction is done at that point and userspace can safely release the lock. -Mike > If we instead get the report and know what the REPORTED_TCB was when > serving that request, then we can exit to user space requesting the > certificates for the report in hand. > A concurrent update can update the reported_tcb like in the above > scenario, but it won't interfere with certificates since the machine > should have certificates for both TCB_VERSIONs to provide until the > commit is complete. > > I don't think it's workable to have 1 grab the file lock and for 5 to > release it. Waiting for a service to update stale certificates should > not block user attestation requests. It would make 4's failure to get > the lock return VMM_BUSY and eventually cause attestations to time out > in sev-guest. > > > > > [*] https://lore.kernel.org/all/CAAH4kHb03Una2kcvyC3W=1ZfANBWF_7a7zsSmWhr_r9g3rCDZw@xxxxxxxxxxxxxx > > > > -- > -Dionna Glaze, PhD, CISSP, CCSP (she/her)