On Fri, Nov 1, 2024 at 3:04 PM Michael Roth <michael.roth@xxxxxxx> wrote: > > On Fri, Nov 01, 2024 at 01:53:26PM -0700, Dionna Amalie Glaze wrote: > > On Mon, Oct 28, 2024 at 11:20 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > > > On Fri, Sep 13, 2024, Dionna Amalie Glaze wrote: > > > > We can extend the ccp driver to, on extended guest request, lock the > > > > command buffer, get the REPORTED_TCB, complete the request, unlock the > > > > command buffer, and return both the response and the REPORTED_TCB at > > > > the time of the request. > > > > > > Holding a lock across an exit to userspace seems wildly unsafe. > > > > I wasn't suggesting this. I was suggesting adding a special ccp symbol > > that would perform two sev commands under the same lock to ensure we > > know the REPORTED_TCB that was used to derive the VCEK that signs an > > attestation report in the MSG_REPORT_REQ guest request. We use that > > atomicity to be sure that when we exit to user space to request > > certificates that we're getting the right version certificates. > > > > > > > > Can you explain the race that you are trying to close, with the exact "bad" sequence > > > of events laid out in chronological order, and an explanation of why the race can't > > > be sovled in userspace? I read through your previous comment[*] (which I assume > > > is the race you want to close?), but I couldn't quite piece together exactly what's > > > broken. > > Hi Dionna, > > > > > 1. the control plane delivers a firmware update. Current TCB version > > goes up. The machine signals that it needs new certificates before it > > can commit. > > 2. VM performs an extended guest request. > > 3. KVM exits to user space to get certificates before getting the > > report from firmware. > > 4. [what I understand Michael Roth was suggesting] User space grabs a > > file lock to see if it can read the cached certificates. It reads the > > certificates and releases the lock before returning to KVM. > > 5. the control plane delivers the certificates to the machine and > > tells it to commit. The machine grabs the certificate file lock, runs > > SNP_COMMIT, and releases the file lock. This command updates both > > COMMITTED_TCB and REPORTED_TCB. > > 6. KVM asks firmware to complete the MSG_REPORT_REQ request, but it's > > a different REPORTED_TCB. > > 7. Guest receives the wrong certificates for certifying the report it > > just received. > > > > The fact that 4 has to release the lock before getting the attestation > > report is the problem. > > We wouldn't actually release the lock before getting the attestation > report. There's more specifics on the suggested flow in the documentation > update accompanying this patch: > > + NOTE: In the case of SEV-SNP, the endorsement key used by firmware may > + change as a result of management activities like updating SEV-SNP firmware > + or loading new endorsement keys, so some care should be taken to keep the > + returned certificate data in sync with the actual endorsement key in use by > + firmware at the time the attestation request is sent to SNP firmware. The > + recommended scheme to do this is: > + > + - The VMM should obtain a shared or exclusive lock on the path the > + certificate blob file resides at before reading it and returning it to > + KVM, and continue to hold the lock until the attestation request is > + actually sent to firmware. To facilitate this, the VMM can set the > + ``immediate_exit`` flag of kvm_run just after supplying the certificate > + data, and just before and resuming the vCPU. This will ensure the vCPU > + will exit again to userspace with ``-EINTR`` after it finishes fetching > + the attestation request from firmware, at which point the VMM can > + safely drop the file lock. > + > + - Tools/libraries that perform updates to SNP firmware TCB values or > + endorsement keys (e.g. via /dev/sev interfaces such as ``SNP_COMMIT``, > + ``SNP_SET_CONFIG``, or ``SNP_VLEK_LOAD``, see > + Documentation/virt/coco/sev-guest.rst for more details) in such a way > + that the certificate blob needs to be updated, should similarly take an > + exclusive lock on the certificate blob for the duration of any updates > + to endorsement keys or the certificate blob contents to ensure that > + VMMs using the above scheme will not return certificate blob data that > + is out of sync with the endorsement key used by firmware. > > So #5 would not be able to obtain an exclusive file lock until userspace > receives confirmation that the attestation request was processed by > firmware. At that point it will be an accurate reflection of the > attestation state associated with that particular version of the > certificates that was fetched from userspace. So at that point the, > transaction is done at that point and userspace can safely release the lock. > Thanks for the clarification. I'll need to understand this pathway better in our VMM to test this patch series effectively. Will get back to you. > -Mike > > > If we instead get the report and know what the REPORTED_TCB was when > > serving that request, then we can exit to user space requesting the > > certificates for the report in hand. > > A concurrent update can update the reported_tcb like in the above > > scenario, but it won't interfere with certificates since the machine > > should have certificates for both TCB_VERSIONs to provide until the > > commit is complete. > > > > I don't think it's workable to have 1 grab the file lock and for 5 to > > release it. Waiting for a service to update stale certificates should > > not block user attestation requests. It would make 4's failure to get > > the lock return VMM_BUSY and eventually cause attestations to time out > > in sev-guest. > > > > > > > > [*] https://lore.kernel.org/all/CAAH4kHb03Una2kcvyC3W=1ZfANBWF_7a7zsSmWhr_r9g3rCDZw@xxxxxxxxxxxxxx > > > > > > > > -- > > -Dionna Glaze, PhD, CISSP, CCSP (she/her) -- -Dionna Glaze, PhD, CISSP, CCSP (she/her)