Re: [PATCH 0/4] KVM: Honor guest memory types for virtio GPU devices

Yan Zhao <yan.y.zhao@xxxxxxxxx> · Tue, 9 Jan 2024 07:36:22 +0800



On Mon, Jan 08, 2024 at 10:02:50AM -0400, Jason Gunthorpe wrote:
> On Mon, Jan 08, 2024 at 02:02:57PM +0800, Yan Zhao wrote:
> > On Fri, Jan 05, 2024 at 03:55:51PM -0400, Jason Gunthorpe wrote:
> > > On Fri, Jan 05, 2024 at 05:12:37PM +0800, Yan Zhao wrote:
> > > > This series allow user space to notify KVM of noncoherent DMA status so as
> > > > to let KVM honor guest memory types in specified memory slot ranges.
> > > > 
> > > > Motivation
> > > > ===
> > > > A virtio GPU device may want to configure GPU hardware to work in
> > > > noncoherent mode, i.e. some of its DMAs do not snoop CPU caches.
> > > 
> > > Does this mean some DMA reads do not snoop the caches or does it
> > > include DMA writes not synchronizing the caches too?
> > Both DMA reads and writes are not snooped.
> 
> Oh that sounds really dangerous.
>
But the IOMMU for Intel GPU does not do force-snoop, no matter KVM
honors guest memory type or not.

> > > > This is generally for performance consideration.
> > > > In certain platform, GFX performance can improve 20+% with DMAs going to
> > > > noncoherent path.
> > > > 
> > > > This noncoherent DMA mode works in below sequence:
> > > > 1. Host backend driver programs hardware not to snoop memory of target
> > > >    DMA buffer.
> > > > 2. Host backend driver indicates guest frontend driver to program guest PAT
> > > >    to WC for target DMA buffer.
> > > > 3. Guest frontend driver writes to the DMA buffer without clflush stuffs.
> > > > 4. Hardware does noncoherent DMA to the target buffer.
> > > > 
> > > > In this noncoherent DMA mode, both guest and hardware regard a DMA buffer
> > > > as not cached. So, if KVM forces the effective memory type of this DMA
> > > > buffer to be WB, hardware DMA may read incorrect data and cause misc
> > > > failures.
> > > 
> > > I don't know all the details, but a big concern would be that the
> > > caches remain fully coherent with the underlying memory at any point
> > > where kvm decides to revoke the page from the VM.
> > Ah, you mean, for page migration, the content of the page may not be copied
> > correctly, right?
> 
> Not just migration. Any point where KVM revokes the page from the
> VM. Ie just tearing down the VM still has to make the cache coherent
> with physical or there may be problems.
Not sure what's the mentioned problem during KVM revoking.
In host,
- If the memory type is WB, as the case in intel GPU passthrough,
  the mismatch can only happen when guest memory type is UC/WC/WT/WP, all
  stronger than WB.
  So, even after KVM revoking the page, the host will not get delayed
  data from cache.
- If the memory type is WC, as the case in virtio GPU, after KVM revoking
  the page, the page is still hold in the virtio host side.
  Even though a incooperative guest can cause wrong data in the page,
  the guest can achieve the purpose in a more straight-forward way, i.e.
  writing a wrong data directly to the page.
  So, I don't see the problem in this case too.

>  
> > Currently in x86, we have 2 ways to let KVM honor guest memory types:
> > 1. through KVM memslot flag introduced in this series, for virtio GPUs, in
> >    memslot granularity.
> > 2. through increasing noncoherent dma count, as what's done in VFIO, for
> >    Intel GPU passthrough, for all guest memory.
> 
> And where does all this fixup the coherency problem?
> 
> > This page migration issue should not be the case for virtio GPU, as both host
> > and guest are synced to use the same memory type and actually the pages
> > are not anonymous pages.
> 
> The guest isn't required to do this so it can force the cache to
> become incoherent.
> 
> > > If you allow an incoherence of cache != physical then it opens a
> > > security attack where the observed content of memory can change when
> > > it should not.
> > 
> > In this case, will this security attack impact other guests?
> 
> It impacts the hypervisor potentially. It depends..
Could you elaborate more on how it will impact hypervisor?
We can try to fix it if it's really a case.