On Fri, Jul 26, 2024 at 03:15:01PM +0800, Binbin Wu wrote: > > > On 6/29/2024 8:36 AM, Michael Roth wrote: > > On Fri, Jun 28, 2024 at 01:08:19PM -0700, Sean Christopherson wrote: > > > On Wed, Jun 26, 2024, Michael Roth wrote: > > > > On Wed, Jun 26, 2024 at 07:22:43AM -0700, Sean Christopherson wrote: > > > > > On Fri, Jun 21, 2024, Michael Roth wrote: > > > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > > > > > index ecfa25b505e7..2eea9828d9aa 100644 > > > > > > --- a/Documentation/virt/kvm/api.rst > > > > > > +++ b/Documentation/virt/kvm/api.rst > > > > > > @@ -7122,6 +7122,97 @@ Please note that the kernel is allowed to use the kvm_run structure as the > > > > > > primary storage for certain register types. Therefore, the kernel may use the > > > > > > values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set. > > > > > > +:: > > > > > > + > > > > > > + /* KVM_EXIT_COCO */ > > > > > > + struct kvm_exit_coco { > > > > > > + #define KVM_EXIT_COCO_REQ_CERTS 0 > > > > > > + #define KVM_EXIT_COCO_MAX 1 > > > > > > + __u8 nr; > > > > > > + __u8 pad0[7]; > > > > > > + union { > > > > > > + struct { > > > > > > + __u64 gfn; > > > > > > + __u32 npages; > > > > > > + #define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN 1 > > > > > > + #define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC (1 << 31) > > > > > Unless I'm mistaken, these error codes are defined by the GHCB, which means the > > > > > values matter, i.e. aren't arbitrary KVM-defined values. > > > > They do happen to coincide with the GHCB-defined values: > > > > > > > > /* > > > > * The GHCB spec only formally defines INVALID_LEN/BUSY VMM errors, but define > > > > * a GENERIC error code such that it won't ever conflict with GHCB-defined > > > > * errors if any get added in the future. > > > > */ > > > > #define SNP_GUEST_VMM_ERR_INVALID_LEN 1 > > > > #define SNP_GUEST_VMM_ERR_BUSY 2 > > > > #define SNP_GUEST_VMM_ERR_GENERIC BIT(31) > > > > > > > > and not totally by accident. But the KVM_EXIT_COCO_REQ_CERTS_ERR_* are > > > > defined/documented without any reliance on the GHCB spec and are purely > > > > KVM-defined. I just didn't really see any reason to pick different > > > > numerical values since it seems like purposely obfuscating things for > > > For SNP. For other vendors, the numbers look bizarre, e.g. why bit 31? And the > > > fact that it appears to be a mask is even more odd. > > That's fair. Values 1 and 2 made sense so just re-use, but that results > > in a awkward value for _GENERIC that's not really necessary for the KVM > > side. > > > > > > no real reason. But the code itself doesn't rely on them being the same > > > > as the spec defines, so we are free to define these however we'd like as > > > > far as the KVM API goes. > > > > > I forget exactly what we discussed in PUCK, but for the error codes, I think KVM > > > > > should either define it's own values that are completely disconnected from any > > > > > "harware" spec, or KVM should very explicitly #define all hardware values and have > > > > I'd gotten the impression that option 1) is what we were sort of leaning > > > > toward, and that's the approach taken here. > > > > And if we expose things selectively to keep the ABI small, it's a bit > > > > awkward too. For instance, KVM_EXIT_COCO_REQ_CERTS_ERR_* basically needs > > > > a way to indicate success/fail/ENOMEM. Which we have with > > > > (assuming 0==success): > > > > > > > > #define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN 1 > > > > #define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC (1 << 31) > > > > > > > > But the GHCB also defines other values like: > > > > > > > > #define SNP_GUEST_VMM_ERR_BUSY 2 > > > > > > > > which don't make much sense to handle on the userspace side and doesn't > > > Why not? If userspace is waiting on a cert update for whatever reason, why can't > > > it signal "busy" to the guest? > > My thinking was that userspace is free to take it's time and doesn't need > > to report delays back to KVM. But it would reduce the potential for > > soft-lockups in the guest, so it might make sense to work that into the > > API. > > > > But more to original point, there could be something added in the future > > that really has nothing to do with anything involving KVM<->userspace > > interaction and so would make no sense to expose to userspace. > > Unfortunately I picked a bad example. :) > > > > > > really have anything to do with the KVM_EXIT_COCO_REQ_CERTS KVM event, > > > > which is a separate/self-contained thing from the general guest request > > > > protocol. So would we expose that as ABI or not? If not then we end up > > > > with this weird splitting of code. And if yes, then we have to sort of > > > > give userspace a way to discover whenever new error codes are added to > > > > the GHCB spec, because KVM needs to understand these value too and > > > Not necessarily. So long as KVM doesn't need to manipulate guest state, e.g. to > > > set RBX (or whatever reg it is) for ERR_INVALID_LEN, then KVM doesn't need to > > > care/know about the error codes. E.g. userspace could signal VMM_BUSY and KVM > > > would happily pass that to the guest. > > But given we already have an exception to that where KVM does need to > > intervene for certain errors codes like ERR_INVALID_LEN that require > > modifying guest state, it doesn't seem like a good starting position > > to have to hope that it doesn't happen again. > > > > It just doesn't seem necessary to put ourselves in a situation where > > we'd need to be concerned by that at all. If the KVM API is a separate > > and fairly self-contained thing then these decisions are set in stone > > until we want to change it and not dictated/modified by changes to > > anything external without our explicit consideration. > > > > I know the certs things is GHCB-specific atm, but when the certs used > > to live inside the kernel the KVM_EXIT_* wasn't needed at all, so > > that's why I see this as more of a KVM interface thing rather than > > a GHCB one. And maybe eventually some other CoCo implementation also > > needs some interface for fetching certificates/blobs from userspace > > and is able to re-use it still because it's not too SNP-specific > > and the behavior isn't dictated by the GHCB spec (e.g. > > ERR_INVALID_LEN might result in some other state needing to be > > modified in their case rather than what the GHCB dictates.) > > TDX GHCI does have a similar PV interface for TDX guest to get quota, i.e., > TDG.VP.VMCALL<GetQuote>. This GetQuote PV interface is designed to invoke > a request to generate a TD-Quote signing by a service hosting TD-Quoting > Enclave operating in the host environment for a TD Report passed as a > parameter by the TD. > And the request will be forwarded to userspace for handling. > > So like GHCB, TDX needs to pass a shared buffer to userspace, which is > specified by GPA and size (4K aligned) and get the error code from > userspace and forward the error code to guest. > > But there are some differences from GHCB interface. > 1. TDG.VP.VMCALL<GetQuote> is a a doorbell-like interface used to queue a > request. I.e., it is an asynchronous request. The error code represents > the status of request queuing, *not* the status of TD Quote generation.. > 2. Besides the error code returned by userspace for GetQuote interface, the > GHCI spec defines a "Status Code" field in the header of the shared > buffer. > The "Status Code" field is also updated by VMM during the real handling > of > getting quote (after TDG.VP.VMCALL<GetQuote> returned to guest). > After the TDG.VP.VMCALL<GetQuote> returned and back to TD guest, the TD > guest can poll the "Status Code" field to check if the processing is > in-flight, succeeded or failed. > Since the real handling of getting quota is happening in userspace, and > it will interact directly with guest, for TDX, it has to expose TDX > specific error code to userspace to update the result of quote > generation. > > Currently, TDX is about to add a new TDX specific KVM exit reason, i.e., > KVM_EXIT_TDX_GET_QUOTE and its related data structure based on a previous > discussion. https://lore.kernel.org/kvm/Zg18ul8Q4PGQMWam@xxxxxxxxxx/ > For the error code returned by userspace, KVM simply forward the error code > to guest without further translation or handling. > > I am neutral to have a common KVM exit reason to handle both GHCB for > REQ_CERTS and GHCI for GET_QUOTE. But for the error code, can we uses > vendor > specific error codes if KVM cares about the error code returned by userspace > in vendor specific complete_userspace_io callback? A few weeks back we discussed during the PUCK call on whether it makes sense for use a common exit type for REQ_CERTS and TDX_GET_QUOTE, and due to the asynchronous/polling nature of TDX_GET_QUOTE, and the somewhat-particular file-locking requirements that need to be built into the REQ_CERTS handling, we'd decided that it's probably more trouble than it's worth to try to merge the 2. However, I'm still hoping that KVM_EXIT_COCO might still provide some useful infrastructure for introducing something like KVM_EXIT_COCO_GET_QUOTE that implements the TDX-specific requirements more directly. I've just submitted v2 of KVM_EXIT_COCO where the userspace-provided error codes are reworked to be less dependent on specific spec-defined values but instead relies on standard error codes that KVM can provide special handling for internally when needed: https://lore.kernel.org/kvm/20241119133513.3612633-1-michael.roth@xxxxxxx/ But I suppose in your case userspace would just return "SUCCESS"/0 and then all the vendor-specific values are mainly in relation to the "Status Code" field so it likely doesn't make a huge difference as far as what userspace passes back to KVM. Thanks, Mike > > BTW, here is the plan of 4 hypercalls needing to exit to userspace for > TDX basic support series: > TDG.VP.VMCALL<SetupEventNotifyInterrupt> > - Add a new KVM exit reason KVM_EXIT_TDX_SETUP_EVENT_NOTIFY > TDG.VP.VMCALL<GetQuote> > - Add a new KVM exit reason KVM_EXIT_TDX_GET_QUOTE > TDG.VP.VMCALL<MapGPA> > - Reuse KVM_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE > TDG.VP.VMCALL<ReportFatalError> > - Reuse KVM_EXIT_SYSTEM_EVENT but add a new type > KVM_SYSTEM_EVENT_TDX_FATAL_ERROR > >