Re: [RFC PATCH 00/10] KVM: selftests: Add support for test-selectable ucall implementations

Michael Roth <michael.roth@xxxxxxx> · Wed, 5 Jan 2022 11:02:44 -0600

On Wed, Jan 05, 2022 at 12:17:33AM +0000, Sean Christopherson wrote:
> On Tue, Jan 04, 2022, Michael Roth wrote:
> > On Thu, Dec 30, 2021 at 09:11:12PM +0000, Sean Christopherson wrote:
> > > not in-kernel.  That is bound to bite someone.  The only issue with SEV is the
> > > address, not the VM-Exit mechanism.  That doesn't change with SEV-ES, SEV-SNP,
> > > or TDX, as PIO and HLT will both get reflected as #VC/#VE, i.e. the guest side
> > > needs to be updated to use VMGEXIT/TDCALL no matter what, at which point having
> > > the hypercall request PIO emulation is just as easy as requesting HLT.
> > 
> > I'm not aware of any #VC handling needed for HLT in the case of
> > SEV-ES/SEV-SNP. That was one of the reasons for the SEV tests using
> > this ucall implementation.
> 
> Ah, you're right, HLT is an "automatic" exit and the CPU takes care of adjusting
> RIP.  TDX is the only one that requires a hypercall.
> 
> > Of course, at some point, we'd want full support for PIO/MMIO/etc. in the #VC
> > handler, but it's not something I'd planned on adding until after the SEV-SNP
> > tests, since it seems like we'd need to import a bunch of intruction decoding
> > code from elsewhere in the kernel, which is a lot of churn that's not
> > immediately necessary for getting at least some basic tests in place. Since
> > the HLT implementation is only 20 lines of code it seemed like a reasonable
> > stop-gap until we start getting more CoCo tests in place. But the in-kernel
> > APIC issue probably needs more consideration...
> > 
> > Perhaps for *just* PIO, the intruction decoding can be open-coded so it
> > can be added to the initial #VC handler implementation, which would avoid the
> > need for HLT implementation. I'll take a look at that.
> 
> PIO shouldn't require instruction decoding or a #VC handler.  What I was thinking
> is that the guest in the selftest would make a direct #VMGEXIT/TDCALL to request
> PIO instead of executing an OUT.  

That seems like a nicer approach. But it sort of lends itself to having
test-specific ucall implementations in some form. How are you thinking
vm_create() should decide what implementation to use? With this series
in place it could be something like:

  vm_create(..., struct ucall_ops *ops)
    ucall_init_ops(ops)

and with the SEV selftests in their current form it would look something
like:

  sev_vm_create(...)
    vm_create_with_ucall(..., ops=ucall_ops_pio_vmgexit)
      ucall_init_ops(ops)

is that sort of what you're thinking, or something else?

> 
> > > I also don't like having to differentiate between a "shared" and "regular" ucall.
> > > I kind of like having to explicitly pass the ucall object being used, but that
> > > puts undue burden on simple single-vCPU tests.
> > 
> > I tried to avoid it, but I got hung up on that fact that pre-allocating
> > arrays/lists of ucall structs needs to be done for each VM, and so we'd
> > end up needing some way for a guest to identify which pool it's ucall
> > struct should be allocated from. But you've gotten around that by just
> > sync_global_to_guest()'ing for each pool at the time ucall_init() is
> > called, so the guest only ever sees it's particular pool. Then the switch
> > from writing GVA to writing GPA solves the translation problem. Nice.
> > 
> > > 
> > > The inability to read guest private memory is really the only issue, and that can
> > > be easily solved without completely revamping the ucall framework, and without
> > > having to update a huge pile of tests to make them place nice with private memory.
> > 
> > I think the first 5 patches in this series are still relevant cleanups
> > vs. having a complete standalone ucall implementation for each arch, and Andrew
> > has also already started looking at other header cleanups related to
> > patch #1, so maybe Paolo would still like to queue those. Would also
> > provide a better starting point for having a centralized allocator for
> > the ucall structs, which you hinted at wanting below.
> > 
> > But the subsequent patches that add the ucall_shared() interfaces should
> > probably be set aside for now in favor of your proposal.
> > 
> > > 
> > > This would also be a good opportunity to clean up the stupidity of tests having to
> > > manually call ucall_init(), drop the unused/pointless @arg from ucall_init(), and
> > > maybe even fix arm64's lurking landmine of not being SMP safe (the address is shared
> > > by all vCPUs).
> > 
> > I thought you *didn't* want to update a huge pile of tests :) I suppose
> > it's unavoidable, since with your proposal, having something like ucall_init()
> > being called at some point is required, as opposed to the current
> > implementation where it is optional. Are you intending to have it be
> > called automatically by vm_create*()?
> 
> Yeah, I was thinking it could be done at the lowest level vm_create() helper.
> We'll need to expand vm_create() (or add yet another layer to avoid modifying a
> pile of tests) to allow opting out of initializing ucall, e.g. sev_migrate_tests.c
> needs to create multiple concurrent VMs, but happily doesn't need ucall support.

Why does sev_migrate_tests need to opt out? Couldn't it use
ucall_ops_pio_vmgexit like that SEV case above?

I ask because there is a ucall() in the exception handling code where
some unhandled exceptions result in the guest automatically issuing a
ucall(UCALL_UNHANDLED), so even when tests don't use ucall() they
might still rely on it if they enable exception handling. So that might
be an argument for always setting up at least the default ucall_ops_pio
implementation and creating a pool just in case. (or an argument for
dropping the UCALL_HANDLED handling).

> 
> > > To reduce the burden on tests and avoid ordering issues with creating vCPUs,
> > > allocate a ucall struct for every possible vCPU when the VM is created and stuff
> > > the GPA of the struct in the struct itself so that the guest can communicate the
> > > GPA instead of the GVA.  Then confidential VMs just need to make all structs shared.
> > 
> > So a separate call like:
> > 
> >   ucall_make_shared(vm->ucall_list)
> > 
> > ? Might need some good documentation/assertions to make sure it gets
> > called at the right place for confidential VMs, and may need some extra
> > hooks in SEV selftest implementation for switching from private to shared
> > after the memory has already been allocated, but seems reasonable.
> 
> Again, I was thinking that it would be done unconditionally by ucall_init(), i.e.
> would be automatically handled by the selftest framework and would Just Work for
> individual tests.

Ok, I'll have to think that through more. Currently with the SEV
selftests as they we have:

  sev_vm_create(policy, npages)
    vm = vm_create(...)
    vm_set_memory_encryption(vm, encrypt_by_default, enc_bit)
    //vm_vaddr_alloc_shared() can be used now

The ucall struct allocations would need to go through
vm_vaddr_alloc_shared() to make sure the selftest library tracks/maps
the pages as shared, but that vm_set_memory_encryption() happens too
late if the ucall_init() stuff is done in vm_create(). It should be
possible to pass the vm_set_memory_encryption() arguments directly to
vm_create() to allow for what you're proposing, but I guess we'd need
a new vm_create() wrapper that handles both the
vm_set_memory_encryption() args, along with the ucall_ops above,
something like:

  sev_vm_create(policy, npages)
    vm = vm_create_coco(..., encrypt_by_default, enc_bit/shared_bit, ucall_ops)

Or were you thinking something else? Just trying to get an idea of how
this will all need to tie in with the SEV selftests and what needs to
change on that end.