Re: [RFC V1 PATCH 0/3] selftests: KVM: sev: selftests for fd-based approach of supporting private memory

Michael Roth <michael.roth@xxxxxxx> · Mon, 13 Jun 2022 14:35:43 -0500

On Mon, Jun 13, 2022 at 12:49:28PM -0500, Michael Roth wrote:
> On Fri, Jun 10, 2022 at 02:01:41PM -0700, Vishal Annapurve wrote:
> > ....
> > >
> > > I ended up adding a KVM_CAP_UNMAPPED_PRIVATE_MEM to distinguish between the
> > > 2 modes. With UPM-mode enabled it basically means KVM can/should enforce that
> > > all private guest pages are backed by private memslots, and enable a couple
> > > platform-specific hooks to handle MAP_GPA_RANGE, and queries from MMU on
> > > whether or not an NPT fault is for a private page or not. SEV uses these hooks
> > > to manage its encryption bitmap, and uses that bitmap as the authority on
> > > whether or not a page is encrypted. SNP uses GHCB page-state-change requests
> > > so MAP_GPA_RANGE is a no-op there, but uses the MMU hook to indicate whether a
> > > fault is private based on the page fault flags.
> > >
> > > When UPM-mode isn't enabled, MAP_GPA_RANGE just gets passed on to userspace
> > > as before, and platform-specific hooks above are no-ops. That's the mode
> > > your SEV self-tests ran in initially. I added a test that runs the
> > > PrivateMemoryPrivateAccess in UPM-mode, where the guest's OS memory is also
> > > backed by private memslot and the platform hooks are enabled, and things seem
> > > to still work okay there. I only added a UPM-mode test for the
> > > PrivateMemoryPrivateAccess one though so far. I suppose we'd want to make
> > > sure it works exactly as it did with UPM-mode disabled, but I don't see why
> > > it wouldn't.
> > 
> > Thanks Michael for the update. Yeah, using the bitmap to track
> > private/shared-ness of gfn ranges should be the better way to go as
> > compared to the limited approach I used to just track a single
> > contiguous pfn range.
> > I spent some time in getting the SEV/SEV-ES priv memfd selftests to
> > execute from private fd as well and ended up doing similar changes as
> > part of the github tree:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fvishals4gh%2Flinux%2Fcommits%2Fsev_upm_selftests_rfc_v2&amp;data=05%7C01%7Cmichael.roth%40amd.com%7Cf040f8a9f98146f8008508da4b2472c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637904917162115269%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=%2Bb3S2xOAWga8k5tsS2EHMQF5CXuKG60qy0ToeEhhQ4A%3D&amp;reserved=0.
> > 
> > >
> > > But probably worth having some discussion on how exactly we should define this
> > > mode, and whether that meshes with what TDX folks are planning.
> > >
> > > I've pushed my UPM-mode selftest additions here:
> > >   https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmdroth%2Flinux%2Fcommits%2Fsev_upm_selftests_rfc_v1_upmmode&amp;data=05%7C01%7Cmichael.roth%40amd.com%7Cf040f8a9f98146f8008508da4b2472c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637904917162115269%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=3YLZcCevIkuo5cw%2FpKk5Sf9y6%2F1ZPss6ujZtLYEbV3M%3D&amp;reserved=0
> > >
> > > And the UPM SEV/SEV-SNP tree I'm running them against (DISCLAIMER: EXPERIMENTAL):
> > >   https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmdroth%2Flinux%2Fcommits%2Fpfdv6-on-snpv6-upm1&amp;data=05%7C01%7Cmichael.roth%40amd.com%7Cf040f8a9f98146f8008508da4b2472c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637904917162115269%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=mW8ypNWyREtoDJ%2BHNi20OT8Hzelqk5Na8eC8ihkfCjY%3D&amp;reserved=0
> > >
> > 
> > Thanks for the references here. This helps get a clear picture around
> > the status of priv memfd integration with Sev-SNP VMs and this work
> > will be the base of future SEV specific priv memfd selftest patches as
> > things get more stable.
> > 
> > I see usage of pwrite to populate initial private memory contents.
> > Does it make sense to have SEV_VM_LAUNCH_UPDATE_DATA handle the
> > private fd population as well?
> > I tried to prototype it via:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fvishals4gh%2Flinux%2Fcommit%2Fc85ee15c8bf9d5d43be9a34898176e8230a3b680%23&amp;data=05%7C01%7Cmichael.roth%40amd.com%7Cf040f8a9f98146f8008508da4b2472c5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637904917162115269%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=QwP4JioniC06yFV7c%2BY35LtqJy9INGlcQ9Z6gn3nOrI%3D&amp;reserved=0
> 
> Thanks for the pointer and for taking a stab at this approach (hadn't
> realized you were looking into this so sorry for the overlap with your
> code).
> 
> > as I got this suggestion from Erdem Aktas(erdemaktas@google) while
> > discussing about executing guest code from private fd.
> 
> The way we way have the host patches implemented currently is sort of based
> around the idea that userspace handles all private/shared conversion via
> allocations/deallocations from the private backing store, since I
> thought that was one of the design goals. For SNP that means allocating a
> page from backing store will trigger the additional hooks in the kernel needed
> to do some additional bookkeeping like RMP updates and removing from directmap,
> which I'm doing via a platform-specific callback I've added to the KVM memfile
> notifier callback.
> 
> There was some talk of allowing a sort of pre-boot stage to the
> MFD_INACCESSIBLE protections where writes would be allowed up until a
> certain point. The kernel hack to allow pwrite() was sort of a holdover
> for this support.
> 
> Handling pre-population as part of SNP_LAUNCH_UPDATE seems sort of
> incompatible with this, since it reads from shared memory and writes
> into private memory.

Well, no, it wouldn't be, since your code handles it the same way as
mine, where kvm_private_mem_get_pfn() allocates a page, but doesn't
generate a notifier event, so you can defer things like RMP updates
until after the memory is populated. So this might be a reasonable
approach as well. But still worth exploring if a more general KVM ioctl
is the better approach.