Re: [PATCH 3/4] arm/arm64: KVM: Flush caches to memory on unmap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 12, 2015 at 09:13:52PM +0100, Christoffer Dall wrote:
> On Mon, Jan 12, 2015 at 11:15:34AM +0000, Marc Zyngier wrote:
> > On 11/01/15 12:30, Christoffer Dall wrote:
> > > On Fri, Jan 09, 2015 at 02:35:49PM +0000, Marc Zyngier wrote:
> > >> On 09/01/15 12:30, Christoffer Dall wrote:
> > >>> On Thu, Jan 08, 2015 at 11:59:08AM +0000, Marc Zyngier wrote:
> > >>>> Let's assume a guest has created an uncached mapping, and written
> > >>>> to that page. Let's also assume that the host uses a cache-coherent
> > >>>> IO subsystem. Let's finally assume that the host is under memory
> > >>>> pressure and starts to swap things out.
> > >>>>
> > >>>> Before this "uncached" page is evicted, we need to make sure it
> > >>>> gets invalidated, or the IO subsystem is going to swap out the
> > >>>> cached view, loosing the data that has been written there.
> > >>>>
> > >>>> Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx>
> > >>>> ---
> > >>>>  arch/arm/include/asm/kvm_mmu.h   | 31 +++++++++++++++++++++++++++
> > >>>>  arch/arm/kvm/mmu.c               | 46 +++++++++++++++++++++++++++-------------
> > >>>>  arch/arm64/include/asm/kvm_mmu.h | 18 ++++++++++++++++
> > >>>>  3 files changed, 80 insertions(+), 15 deletions(-)
> > >>>>
> > >>>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> > >>>> index 63e0ecc..7ceb836 100644
> > >>>> --- a/arch/arm/include/asm/kvm_mmu.h
> > >>>> +++ b/arch/arm/include/asm/kvm_mmu.h
> > >>>> @@ -44,6 +44,7 @@
> > >>>>  
> > >>>>  #ifndef __ASSEMBLY__
> > >>>>  
> > >>>> +#include <linux/highmem.h>
> > >>>>  #include <asm/cacheflush.h>
> > >>>>  #include <asm/pgalloc.h>
> > >>>>  
> > >>>> @@ -190,6 +191,36 @@ static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
> > >>>>  
> > >>>>  #define kvm_virt_to_phys(x)		virt_to_idmap((unsigned long)(x))
> > >>>>  
> > >>>> +static inline void __kvm_flush_dcache_pte(pte_t pte)
> > >>>> +{
> > >>>> +	void *va = kmap_atomic(pte_page(pte));
> > >>>> +
> > >>>> +	kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> > >>>> +
> > >>>> +	kunmap_atomic(va);
> > >>>> +}
> > >>>> +
> > >>>> +static inline void __kvm_flush_dcache_pmd(pmd_t pmd)
> > >>>> +{
> > >>>> +	unsigned long size = PMD_SIZE;
> > >>>> +	pfn_t pfn = pmd_pfn(pmd);
> > >>>> +
> > >>>> +	while (size) {
> > >>>> +		void *va = kmap_atomic_pfn(pfn);
> > >>>> +
> > >>>> +		kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> > >>>> +
> > >>>> +		pfn++;
> > >>>> +		size -= PAGE_SIZE;
> > >>>> +
> > >>>> +		kunmap_atomic(va);
> > >>>> +	}
> > >>>> +}
> > >>>> +
> > >>>> +static inline void __kvm_flush_dcache_pud(pud_t pud)
> > >>>> +{
> > >>>> +}
> > >>>> +
> > >>>>  void stage2_flush_vm(struct kvm *kvm);
> > >>>>  
> > >>>>  #endif	/* !__ASSEMBLY__ */
> > >>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> > >>>> index 1dc9778..1f5b793 100644
> > >>>> --- a/arch/arm/kvm/mmu.c
> > >>>> +++ b/arch/arm/kvm/mmu.c
> > >>>> @@ -58,6 +58,21 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
> > >>>>  		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
> > >>>>  }
> > >>>>  
> > >>>> +static void kvm_flush_dcache_pte(pte_t pte)
> > >>>> +{
> > >>>> +	__kvm_flush_dcache_pte(pte);
> > >>>> +}
> > >>>> +
> > >>>> +static void kvm_flush_dcache_pmd(pmd_t pmd)
> > >>>> +{
> > >>>> +	__kvm_flush_dcache_pmd(pmd);
> > >>>> +}
> > >>>> +
> > >>>> +static void kvm_flush_dcache_pud(pud_t pud)
> > >>>> +{
> > >>>> +	__kvm_flush_dcache_pud(pud);
> > >>>> +}
> > >>>> +
> > >>>>  static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
> > >>>>  				  int min, int max)
> > >>>>  {
> > >>>> @@ -128,9 +143,12 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
> > >>>>  	start_pte = pte = pte_offset_kernel(pmd, addr);
> > >>>>  	do {
> > >>>>  		if (!pte_none(*pte)) {
> > >>>> +			pte_t old_pte = *pte;
> > >>>>  			kvm_set_pte(pte, __pte(0));
> > >>>> -			put_page(virt_to_page(pte));
> > >>>
> > >>> was this a bug beforehand that we released the page before we flushed
> > >>> the TLB?
> > >>
> > >> I don't think so. TLB maintenance doesn't require the mapping to exist
> > >> in the page tables (while the cache maintenance do).
> > >>
> > > 
> > > duh, the put_page is the ref counting on the page table itself, not the
> > > underlying memory page.  Forget what I asked.
> > > 
> > >>>>  			kvm_tlb_flush_vmid_ipa(kvm, addr);
> > >>>> +			if ((pte_val(old_pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
> > >>>> +				kvm_flush_dcache_pte(old_pte);
> > >>>
> > >>> this is confusing me: We are only flushing the cache for cached stage-2
> > >>> mappings?  Weren't we trying to flush the cache for uncached mappings?
> > >>> (we obviously also need flush a cached stage-2 mapping but where the
> > >>> guest is mapping it as uncached, but we don't know that easily).
> > >>>
> > >>> Am I missing something completely here?
> > >>
> > >> I think you must be misreading something:
> > >> - we want to invalidate mappings because the guest may have created an
> > >> uncached mapping
> > > 
> > > I don't quite understand: we are invalidating mappings because the page
> > > is being swapped out (and the guest must fault if it tries to access it
> > > again).  Not because the guest may have created an uncached mapping,
> > > that's just an aspect of the situation.  Or am I thinking about this the
> > > wrong way?
> > 
> > My wording was quite ambiguous. Indeed, we're removing the mapping
> > because the page is being evicted. In a perfect and ideal world (where
> > the guest doesn't do anything silly and everything is cache coherent),
> > we shouldn't have to do anything cache wise. But see below.
> > 
> > >> - as we cannot know about the guest's uncached mappings, we flush things
> > >> unconditionally (basically anything that is RAM).
> > > 
> > > so isn't the problem that the host may have an invalid cached view of
> > > the data, so we need to invalidate that view, not flush the invalid data
> > > to RAM?  Does the kernel take care of that somewhere else for a
> > > cache-coherent IO subsystem?
> > 
> > There is two potential problems:
> > 
> > - The guest has created an uncached mapping, and you have a cache
> > coherent IO subsystem: we need to invalidate the cached view. But since
> > we don't know where that mapping is (as we don't track guest mappings),
> > we must do a clean+invalidate in order not to corrupt cached mappings.
> 
> so here you have:
> 
> CACHE=dirty random old data
> RAM=data that the guest wrote
> 
> now your IO subsystem will read from the cache, which is bad.  But when
> you do a clean+invalidate, you get:
> 
> CACHE=invalidated
> RAM=dirty random old data
> 
> I still don't get it...
> 
ok, so talked to Marc earlier and he explained me the details.  I think
this is really quite a subtle, but nevertheless important situation to
catch:

If a guest maps certain memory pages as uncached, all writes will bypass
the data cache and go directly to RAM.  However, the CPUs can still
speculate reads (not writes) and fill cache lines with data.

Those cache lines will be *clean* cache lines though, so a
clean+invalidate operation is equivalent to an invalidate operation,
because no cache lines are marked dirty.

Those clean cache lines could be filled prior to an uncached write by
the guest, and the cache coherent IO subsystem would therefore end up
writing old data to disk.

We can therefore do exactly what Marc was suggesting to do safely, and
we have to do it.

I hope I finally got this right?

I think we need this explanation properly as part of the commit message
or as comments in the code, and I think we need to add a comment about
why it's pointless to flush device memory mappings (or that that is why
we do the check against PAGE_S2_DEVICE).

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux