On Thu, May 20, 2021, Sean Christopherson wrote: > On Thu, May 20, 2021, Sean Christopherson wrote: > > On Mon, May 17, 2021, Tom Lendacky wrote: > > > On 5/14/21 6:06 PM, Peter Gonda wrote: > > > > On Fri, May 14, 2021 at 1:22 PM Tom Lendacky <thomas.lendacky@xxxxxxx> wrote: > > > >> > > > >> Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT > > > >> exit code and parameters fail. Since the VMGEXIT instruction can be issued > > > >> from userspace, even though userspace (likely) can't update the GHCB, > > > >> don't allow userspace to be able to kill the guest. > > > >> > > > >> Return a #GP request through the GHCB when validation fails, rather than > > > >> terminating the guest. > > > > > > > > Is this a gap in the spec? I don't see anything that details what > > > > should happen if the correct fields for NAE are not set in the first > > > > couple paragraphs of section 4 'GHCB Protocol'. > > > > > > No, I don't think the spec needs to spell out everything like this. The > > > hypervisor is free to determine its course of action in this case. > > > > The hypervisor can decide whether to inject/return an error or kill the guest, > > but what errors can be returned and how they're returned absolutely needs to be > > ABI between guest and host, and to make the ABI vendor agnostic the GHCB spec > > is the logical place to define said ABI. > > > > For example, "injecting" #GP if the guest botched the GHCB on #VMGEXIT(CPUID) is > > completely nonsensical. As is, a Linux guest appears to blindly forward the #GP, > > which means if something does go awry KVM has just made debugging the guest that > > much harder, e.g. imagine the confusion that will ensue if the end result is a > > SIGBUS to userspace on CPUID. > > > > There needs to be an explicit error code for "you gave me bad data", otherwise > > we're signing ourselves up for future pain. > > More concretely, I think the best course of action is to define a new return code > in SW_EXITINFO1[31:0], e.g. '2', with additional information in SW_EXITINFO2. > > In theory, an old-but-sane guest will interpret the unexpected return code as > fatal to whatever triggered the #VMGEXIT, e.g. SIGBUS to userspace. Unfortunately > Linux isn't sane because sev_es_ghcb_hv_call() assumes any non-'1' result means > success, but that's trivial to fix and IMO should be fixed irrespective of where > this goes. One last thing (hopefully): Erdem pointed out that if the GCHB GPA (or any derferenced pointers within the GHCB) is invalid or is set to a private GPA (mostly in the context of SNP) then the VMM will likely have no choice but to kill the guest in response to #VMGEXIT. It's probably a good idea to add a blurb in one of the specs explicitly calling out that #VMGEXIT can be executed from userspace, and that before returning to uesrspace the guest kernel must always ensure that the GCHB points at a legal GPA _and_ all primary fields are marked invalid.