On 10 November 2015 at 14:40, Christoffer Dall <christoffer.dall@xxxxxxxxxx> wrote: > On Tue, Nov 10, 2015 at 02:15:45PM +0100, Ard Biesheuvel wrote: >> On 10 November 2015 at 13:22, Christoffer Dall >> <christoffer.dall@xxxxxxxxxx> wrote: >> > On Tue, Nov 10, 2015 at 10:45:37AM +0100, Ard Biesheuvel wrote: >> >> Hi all, >> >> >> >> I wonder if this is a better way to address the problem. It looks at >> >> the nature of the memory rather than the nature of the mapping, which >> >> is probably a more reliable indicator of whether cache maintenance is >> >> required when performing the unmap. >> >> >> >> >> >> -----------8<---------------- >> >> The open coded tests for checking whether a PTE maps a page as >> >> uncached use a flawed 'pte_val(xxx) & CONST != CONST' pattern, >> >> which is not guaranteed to work since the type of a mapping is >> >> not a set of mutually exclusive bits >> >> >> >> For HYP mappings, the type is an index into the MAIR table (i.e, the >> >> index itself does not contain any information whatsoever about the >> >> type of the mapping), and for stage-2 mappings it is a bit field where >> >> normal memory and device types are defined as follows: >> >> >> >> #define MT_S2_NORMAL 0xf >> >> #define MT_S2_DEVICE_nGnRE 0x1 >> >> >> >> I.e., masking *and* comparing with the latter matches on the former, >> >> and we have been getting lucky merely because the S2 device mappings >> >> also have the PTE_UXN bit set, or we would misidentify memory mappings >> >> as device mappings. >> >> >> >> Since the unmap_range() code path (which contains one instance of the >> >> flawed test) is used both for HYP mappings and stage-2 mappings, and >> >> considering the difference between the two, it is non-trivial to fix >> >> this by rewriting the tests in place, as it would involve passing >> >> down the type of mapping through all the functions. >> >> >> >> However, since HYP mappings and stage-2 mappings both deal with host >> >> physical addresses, we can simply check whether the mapping is backed >> >> by memory that is managed by the host kernel, and only perform the >> >> D-cache maintenance if this is the case. >> >> >> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> >> >> --- >> >> arch/arm/kvm/mmu.c | 15 +++++++-------- >> >> 1 file changed, 7 insertions(+), 8 deletions(-) >> >> >> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >> >> index 6984342da13d..7dace909d5cf 100644 >> >> --- a/arch/arm/kvm/mmu.c >> >> +++ b/arch/arm/kvm/mmu.c >> >> @@ -98,6 +98,11 @@ static void kvm_flush_dcache_pud(pud_t pud) >> >> __kvm_flush_dcache_pud(pud); >> >> } >> >> >> >> +static bool kvm_is_device_pfn(unsigned long pfn) >> >> +{ >> >> + return !pfn_valid(pfn); >> >> +} >> >> + >> >> /** >> >> * stage2_dissolve_pmd() - clear and flush huge PMD entry >> >> * @kvm: pointer to kvm structure. >> >> @@ -213,7 +218,7 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd, >> >> kvm_tlb_flush_vmid_ipa(kvm, addr); >> >> >> >> /* No need to invalidate the cache for device mappings */ >> >> - if ((pte_val(old_pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE) >> >> + if (!kvm_is_device_pfn(__phys_to_pfn(addr))) >> >> kvm_flush_dcache_pte(old_pte); >> >> >> >> put_page(virt_to_page(pte)); >> >> @@ -305,8 +310,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd, >> >> >> >> pte = pte_offset_kernel(pmd, addr); >> >> do { >> >> - if (!pte_none(*pte) && >> >> - (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE) >> >> + if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr))) >> >> kvm_flush_dcache_pte(*pte); >> >> } while (pte++, addr += PAGE_SIZE, addr != end); >> >> } >> >> @@ -1037,11 +1041,6 @@ static bool kvm_is_write_fault(struct kvm_vcpu *vcpu) >> >> return kvm_vcpu_dabt_iswrite(vcpu); >> >> } >> >> >> >> -static bool kvm_is_device_pfn(unsigned long pfn) >> >> -{ >> >> - return !pfn_valid(pfn); >> >> -} >> >> - >> >> /** >> >> * stage2_wp_ptes - write protect PMD range >> >> * @pmd: pointer to pmd entry >> >> -- >> >> 1.9.1 >> >> >> > >> > So PAGE_HYP_DEVICE is used only to map the vgic-v2 regions and >> > PAGE_S2_DEVICE is used to map the vgic VCPU interface and for all memory >> > regions where the vma has (vm_flags & VM_PFNMAP). >> > >> > Will these, and only these, cases be covered by the pfn_valid check? >> > >> >> The pfn_valid() check will ensure that cache maintenance is only >> performed on regions that are known to the host as memory, are managed >> by the host (i.e., there is a struct page associated with them) and >> will be accessed by the host via cacheable mappings (they are covered >> by the linear mapping, or [on ARM] will be kmap'ed cacheable if they >> are highmem). If you look at the commit that introduced these tests >> (363ef89f8e9b arm/arm64: KVM: Invalidate data cache on unmap), the >> concern it addresses is that the guest may perform uncached accesses >> to regions that the host has mapped cacheable, meaning guest writes >> may be shadowed by clean cachelines, making them invisble to cache >> coherent I/O. So afaict, pfn_valid() is an appropriate check here. > > right, this I agree with. > >> >> pfn_valid() will not misidentify device regions as memory (unless the >> host is really broken) so this should fix Pavel's case. The converse >> case (a memory region misidentified as a device) could only happen for >> a carve-out (i.e., via the /reserved-memory node) that is mapped >> inside the guest via a pass-through (VM_PFNMAP) mapping. That case is >> already dodgy, since the guest accesses would be forced >> uncached/ordered due to the fact that those mappings are forced >> PAGE_S2_DEVICE at stage 2 (as you mention), and would also be >> misidentified by the current code (due to the PAGE_S2_DEVICE >> attributes) >> > > ok, but such pages should never be swapped out by the host, so I think > we're still ok here. > Yes, I think so, and the patch does not change how that case is handled anyway. > Will you send an updated proper patch to the list? You can add my > RB if you want. > Thanks. I will resend with your R-b and Pavel's Tested-by added. -- Ard. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html