Re: [RFC PATCH 00/19] QEMU gmem implemention

Michael Roth <michael.roth@xxxxxxx> · Thu, 10 Aug 2023 10:58:09 -0500

On Tue, Aug 01, 2023 at 09:45:41AM +0800, Xiaoyao Li wrote:
> On 8/1/2023 12:51 AM, Daniel P. Berrangé wrote:
> > On Mon, Jul 31, 2023 at 12:21:42PM -0400, Xiaoyao Li wrote:
> > > This is the first RFC version of enabling KVM gmem[1] as the backend for
> > > private memory of KVM_X86_PROTECTED_VM.
> > > 
> > > It adds the support to create a specific KVM_X86_PROTECTED_VM type VM,
> > > and introduces 'private' property for memory backend. When the vm type
> > > is KVM_X86_PROTECTED_VM and memory backend has private enabled as below,
> > > it will call KVM gmem ioctl to allocate private memory for the backend.
> > > 
> > >      $qemu -object memory-backend-ram,id=mem0,size=1G,private=on \
> > >            -machine q35,kvm-type=sw-protected-vm,memory-backend=mem0 \
> > > 	  ...
> > > 
> > > Unfortunately this patch series fails the boot of OVMF at very early
> > > stage due to triple fault because KVM doesn't support emulate string IO
> > > to private memory. We leave it as an open to be discussed.
> > > 
> > > There are following design opens that need to be discussed:
> > > 
> > > 1. how to determine the vm type?
> > > 
> > >     a. like this series, specify the vm type via machine property
> > >        'kvm-type'
> > >     b. check the memory backend, if any backend has 'private' property
> > >        set, the vm-type is set to KVM_X86_PROTECTED_VM.
> > > 
> > > 2. whether 'private' property is needed if we choose 1.b as design
> > > 
> > >     with 1.b, QEMU can decide whether the memory region needs to be
> > >     private (allocates gmem fd for it) or not, on its own.
> > > 
> > > 3. What is KVM_X86_SW_PROTECTED_VM going to look like? What's the
> > >     purose of it and what's the requirement on it. I think it's the
> > >     questions for KVM folks than QEMU folks.
> > > 
> > > Any other idea/open/question is welcomed.
> > > 
> > > 
> > > Beside, TDX QEMU implemetation is based on this series to provide
> > > private gmem for TD private memory, which can be found at [2].
> > > And it can work corresponding KVM [3] to boot TDX guest.
> > 
> > We already have a general purpose configuration mechanism for
> > confidential guests.  The -machine argument has a property
> > confidential-guest-support=$OBJECT-ID, for pointing to an
> > object that implements the TYPE_CONFIDENTIAL_GUEST_SUPPORT
> > interface in QEMU. This is implemented with SEV, PPC PEF
> > mode, and s390 protvirt.
> > 
> > I would expect TDX to follow this same design ie
> > 
> >      qemu-system-x86_64 \
> >        -object tdx-guest,id=tdx0,..... \
> >        -machine q35,confidential-guest-support=tdx0 \
> >        ...
> > 
> > and not require inventing the new 'kvm-type' attribute at least.
> 
> yes.
> 
> TDX is initialized exactly as the above.
> 
> This RFC series introduces the 'kvm-type' for KVM_X86_SW_PROTECTED_VM. It's
> my fault that forgot to list the option of introducing sw_protected_vm
> object with CONFIDENTIAL_GUEST_SUPPORT interface.
> Thanks for Isaku to raise it https://lore.kernel.org/qemu-devel/20230731171041.GB1807130@xxxxxxxxxxxxxxxxxxxxx/
> 
> we can specify KVM_X86_SW_PROTECTED_VM this way:
> 
> qemu  \
>   -object sw-protected,id=swp0,... \
>   -machine confidential-guest-support=swp0 \
>   ...
> 
> > For the memory backend though, I'm not so sure - possibly that
> > might be something that still wants an extra property to identify
> > the type of memory to allocate, since we use memory-backend-ram
> > for a variety of use cases.  Or it could be an entirely new object
> > type such as "memory-backend-gmem"
> 
> What I want to discuss is whether providing the interface to users to allow
> them configuring which memory is/can be private. For example, QEMU can do it
> internally. If users wants a confidential guest, QEMU allocates private gmem
> for normal RAM automatically.

I think handling it automatically simplifies things a good deal on the
QEMU side. I think it's still worthwhile to still allow:

 -object memory-backend-memfd-private,...

because it provides a nice mechanism to set up a pair of shared/private
memfd's to enable hole-punching via fallocate() to avoid doubling memory
allocations for shared/private. It's also a nice place to control
potentially-configurable things like:

 - whether or not to enable discard/hole-punching
 - if discard is enabled, whether or not to register the range via
   RamDiscardManager interface so that VFIO/IOMMU mappings get updated
   when doing PCI passthrough. SNP relies on this for PCI passthrough
   when discard is enabled, otherwise DMA occurs to stale mappings of
   discarded bounce-buffer pages:

     https://github.com/AMDESE/qemu/blob/snp-latest/backends/hostmem-memfd-private.c#L449

But for other memory ranges, it doesn't do a lot of good to rely on
users to control those via -object memory-backend-memfd-private, since
QEMU will set up some regions internally, like the UEFI ROM.

It also isn't ideal for QEMU itself to internally control what
should/shouldn't be set up with a backing guest_memfd, because some
guest kernels do weird stuff, like scan for ROM regions in areas that
guest kernels might have mapped as encrypted in guest page table. You
can consider them to be guest bugs, but even current SNP-capable
kernels exhibit this behavior and if the guest wants to do dumb stuff
QEMU should let it.

But for these latter 2 cases, it doesn't make sense to attempt to do
any sort of discarding of backing pages since it doesn't make sense to
discard ROM pages.

So I think it makes sense to just set up the gmemfd automatically across
the board internally, and keep memory-backend-memfd-private around
purely as a way to control/configure discardable memory.

-Mike

> 
>