Re: [RFC 03/16] KVM: selftests: handle encryption bits in page tables

Michael Roth <michael.roth@xxxxxxx> · Sun, 24 Oct 2021 11:49:45 -0500

On Thu, Oct 21, 2021 at 05:26:26PM +0200, Paolo Bonzini wrote:
> On 06/10/21 01:44, Michael Roth wrote:
> > SEV guests rely on an encyption bit which resides within the range that
> > current code treats as address bits. Guest code will expect these bits
> > to be set appropriately in their page tables, whereas helpers like
> > addr_gpa2hva() will expect these bits to be masked away prior to
> > translation. Add proper handling for these cases.
> 
> This is not what you're doing below in addr_gpa2hva, though---or did I
> misunderstand?

The confusion is warranted, addr_gpa2hva() *doesn't* expect the C bit to
be masked in advance so the wording is pretty confusing.

I think I was referring the fact that internally it doesn't need/want the
C-bit, in this case it just masks it away as a convenience to callers,
as opposed to the other functions modified in the patch that actually
make use of it.

It's convenient because page table walkers/mappers make use of
addr_gpa2hva() to do things like silently mask away C-bits via when
translating PTEs to host addresses. We easily convert those callers from:

  addr_gpa2hva(paddr)

to this:

  addr_gpa2hva(addr_raw2gpa(paddr))

but now all new code needs to consider whether it might be dealing with
C-bits or not prior to deciding to pass it to addr_gpa2hva() (or not
really think about it, and add addr_gpa2raw() "just in case"). So since
it's always harmless to mask it away silently addr_gpa2hva(), the
logic/code seems to benefit a good deal if we indicate clearly that
addr_gpa2hva() can accept a 'raw' GPA, and will ignore it completely.

But not a big deal either way if you prefer to keep that explicit. And
commit message still needs to be clarified.

> 
> I may be wrong due to not actually having written the code, but I'd prefer
> if most of these APIs worked only if the C bit has already been stripped.
> In general it's quite unlikely for host code to deal with C=1 pages, so it's
> worth pointing out explicitly the cases where it does.

I've tried to indicate functions that expect the C-bit by adding the 'raw_'
prefix to the gpa/paddr parameters, but as you pointed out with
addr_gpa2hva() it's already a bit inconsistent in that regard, and there's
a couple cases like virt_map() where I should use the 'raw_' prefix as well
that I've missed here.

So that should be addressed, and maybe some additional comments/assertions
might be warranted to guard against cases where the C-bit is passed in
unexpectedly.

But I should probably re-assess why the C-bit is being passed around in
the first place:

 - vm_phy_page[s]_alloc() is the main 'source' for 'raw' GPAs with the
   C-bit set. it determines this based on vm_memcrypt encryption policy,
   and updates the encryption bitmask as well.
 - vm_phy_page[s]_alloc() is callable both in kvm_util lib as well as
   individual tests.
 - in theory, encoding the C-bit in the returned vm_paddr_t means that
   vm_phy_page[s]_alloc() callers can pass that directly into
   virt_map/virt_pg_map() and this will "just work" for both
   encrypted/non-encrypted guests.
 - by masking it away in addr_gpa2hva(), existing tests/code flow mostly
   "just works" as well.

But taking a closer look, in cases where vm_phy_page[s]_alloc() is called
directly by tests, like set_memory_region_test, emulator_error_test, and
smm_test, that raw GPA is compared to hardcoded non-raw GPAs, so they'd
still end up needing fixups to work with the proposed transparent-SEV-mode
stuff. And future code would need to be written to account for this, so
it doesn't really "just work" after all..

So it's worth considering the alternative approach of *not* encoding the
C-bit into GPAs returned by vm_phy_page[s]_alloc(). That would likely
involve introducing something like addr_gpa2raw(), which adds in the
C-bit according to the encryption bitmap as-needed. If we do that:

  - virt_map()/virt_pg_map() still need to accept 'raw' GPAs, since they
    need to deal with cases where pages are being mapping that weren't
    allocated by vm_phy_page[s]_alloc(), and so aren't recorded in the
    bitmap. in those cases it is up to test code to provide the C-bit
    when needed (e.g. things like separate linear mappings for pa()-like
    stuff in guest code).

  - for cases where vm_phy_page[s]_alloc() determines whether the page
    is encrypted, addr_gpa2raw() needs to be used to add back the C-bit
    prior to passing it to virt_map()/virt_pg_map(), both in the library and
    the test code. vm_vaddr_* allocations would handle all this under the
    covers as they do now.

So test code would need to consider cases where addr_gpa2raw() needs to be
used to set the C-bit (which is basically only when they want to mix usage
of the vm_phy_page[s]_alloc with their own mapping of the guest page tables,
which doesn't seem to be done in any existing tests anyway).

The library code would need these addr_gpa2raw() hooks in places where
it calls virt_*map() internally. Probably just a handful of places
though.

Assuming there's no issues with this alternative approach that I may be
missing, I'll look at doing it this way for the next spin.

Even in this alternative approach though, having addr_gpa2hva() silently
mask away C-bit still seems useful for the reasons above, but again, no
strong feelings one way or the other on that.

> 
> Paolo
> 
> > @@ -1460,9 +1480,10 @@ void virt_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
> >    * address providing the memory to the vm physical address is returned.
> >    * A TEST_ASSERT failure occurs if no region containing gpa exists.
> >    */
> > -void *addr_gpa2hva(struct kvm_vm *vm, vm_paddr_t gpa)
> > +void *addr_gpa2hva(struct kvm_vm *vm, vm_paddr_t gpa_raw)
> >   {
> >   	struct userspace_mem_region *region;
>