Re: [PATCH v2] ARM/arm64: KVM: test properly for a PTE's uncachedness

Christoffer Dall <christoffer.dall@xxxxxxxxxx> · Tue, 10 Nov 2015 14:40:48 +0100

On Tue, Nov 10, 2015 at 02:15:45PM +0100, Ard Biesheuvel wrote:
> On 10 November 2015 at 13:22, Christoffer Dall
> <christoffer.dall@xxxxxxxxxx> wrote:
> > On Tue, Nov 10, 2015 at 10:45:37AM +0100, Ard Biesheuvel wrote:
> >> Hi all,
> >>
> >> I wonder if this is a better way to address the problem. It looks at
> >> the nature of the memory rather than the nature of the mapping, which
> >> is probably a more reliable indicator of whether cache maintenance is
> >> required when performing the unmap.
> >>
> >>
> >> -----------8<----------------
> >> The open coded tests for checking whether a PTE maps a page as
> >> uncached use a flawed 'pte_val(xxx) & CONST != CONST' pattern,
> >> which is not guaranteed to work since the type of a mapping is
> >> not a set of mutually exclusive bits
> >>
> >> For HYP mappings, the type is an index into the MAIR table (i.e, the
> >> index itself does not contain any information whatsoever about the
> >> type of the mapping), and for stage-2 mappings it is a bit field where
> >> normal memory and device types are defined as follows:
> >>
> >>     #define MT_S2_NORMAL            0xf
> >>     #define MT_S2_DEVICE_nGnRE      0x1
> >>
> >> I.e., masking *and* comparing with the latter matches on the former,
> >> and we have been getting lucky merely because the S2 device mappings
> >> also have the PTE_UXN bit set, or we would misidentify memory mappings
> >> as device mappings.
> >>
> >> Since the unmap_range() code path (which contains one instance of the
> >> flawed test) is used both for HYP mappings and stage-2 mappings, and
> >> considering the difference between the two, it is non-trivial to fix
> >> this by rewriting the tests in place, as it would involve passing
> >> down the type of mapping through all the functions.
> >>
> >> However, since HYP mappings and stage-2 mappings both deal with host
> >> physical addresses, we can simply check whether the mapping is backed
> >> by memory that is managed by the host kernel, and only perform the
> >> D-cache maintenance if this is the case.
> >>
> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
> >> ---
> >>  arch/arm/kvm/mmu.c | 15 +++++++--------
> >>  1 file changed, 7 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >> index 6984342da13d..7dace909d5cf 100644
> >> --- a/arch/arm/kvm/mmu.c
> >> +++ b/arch/arm/kvm/mmu.c
> >> @@ -98,6 +98,11 @@ static void kvm_flush_dcache_pud(pud_t pud)
> >>       __kvm_flush_dcache_pud(pud);
> >>  }
> >>
> >> +static bool kvm_is_device_pfn(unsigned long pfn)
> >> +{
> >> +     return !pfn_valid(pfn);
> >> +}
> >> +
> >>  /**
> >>   * stage2_dissolve_pmd() - clear and flush huge PMD entry
> >>   * @kvm:     pointer to kvm structure.
> >> @@ -213,7 +218,7 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
> >>                       kvm_tlb_flush_vmid_ipa(kvm, addr);
> >>
> >>                       /* No need to invalidate the cache for device mappings */
> >> -                     if ((pte_val(old_pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
> >> +                     if (!kvm_is_device_pfn(__phys_to_pfn(addr)))
> >>                               kvm_flush_dcache_pte(old_pte);
> >>
> >>                       put_page(virt_to_page(pte));
> >> @@ -305,8 +310,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
> >>
> >>       pte = pte_offset_kernel(pmd, addr);
> >>       do {
> >> -             if (!pte_none(*pte) &&
> >> -                 (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
> >> +             if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr)))
> >>                       kvm_flush_dcache_pte(*pte);
> >>       } while (pte++, addr += PAGE_SIZE, addr != end);
> >>  }
> >> @@ -1037,11 +1041,6 @@ static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
> >>       return kvm_vcpu_dabt_iswrite(vcpu);
> >>  }
> >>
> >> -static bool kvm_is_device_pfn(unsigned long pfn)
> >> -{
> >> -     return !pfn_valid(pfn);
> >> -}
> >> -
> >>  /**
> >>   * stage2_wp_ptes - write protect PMD range
> >>   * @pmd:     pointer to pmd entry
> >> --
> >> 1.9.1
> >>
> >
> > So PAGE_HYP_DEVICE is used only to map the vgic-v2 regions and
> > PAGE_S2_DEVICE is used to map the vgic VCPU interface and for all memory
> > regions where the vma has (vm_flags & VM_PFNMAP).
> >
> > Will these, and only these, cases be covered by the pfn_valid check?
> >
> 
> The pfn_valid() check will ensure that cache maintenance is only
> performed on regions that are known to the host as memory, are managed
> by the host (i.e., there is a struct page associated with them) and
> will be accessed by the host via cacheable mappings (they are covered
> by the linear mapping, or [on ARM] will be kmap'ed cacheable if they
> are highmem). If you look at the commit that introduced these tests
> (363ef89f8e9b arm/arm64: KVM: Invalidate data cache on unmap), the
> concern it addresses is that the guest may perform uncached accesses
> to regions that the host has mapped cacheable, meaning guest writes
> may be shadowed by clean cachelines, making them invisble to cache
> coherent I/O. So afaict, pfn_valid() is an appropriate check here.

right, this I agree with.

> 
> pfn_valid() will not misidentify device regions as memory (unless the
> host is really broken) so this should fix Pavel's case. The converse
> case (a memory region misidentified as a device) could only happen for
> a carve-out (i.e., via the /reserved-memory node) that is mapped
> inside the guest via a pass-through (VM_PFNMAP) mapping. That case is
> already dodgy, since the guest accesses would be forced
> uncached/ordered due to the fact that those mappings are forced
> PAGE_S2_DEVICE at stage 2 (as you mention), and would also be
> misidentified by the current code (due to the PAGE_S2_DEVICE
> attributes)
> 

ok, but such pages should never be swapped out by the host, so I think
we're still ok here.

Will you send an updated proper patch to the list?  You can add my
RB if you want.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html