On 11/17/14 16:29, Paolo Bonzini wrote: > > > On 17/11/2014 15:58, Ard Biesheuvel wrote: >> Readonly memslots are often used to implement emulation of ROMs and >> NOR flashes, in which case the guest may legally map these regions as >> uncached. >> To deal with the incoherency associated with uncached guest mappings, >> treat all readonly memslots as incoherent, and ensure that pages that >> belong to regions tagged as such are flushed to DRAM before being passed >> to the guest. > > On x86, the processor combines the cacheability values from the two > levels of page tables. Is there no way to do the same on ARM? Combining occurs on ARMv8 too. The Stage1 (guest) mapping is very strict (Device non-Gathering, non-Reordering, no Early Write Acknowledgement -- for EFI_MEMORY_UC), which basically "overrides" the Stage2 (very lax host) memory attributes. When qemu writes, as part of emulating the flash programming commands, to the RAMBlock that *otherwise* backs the flash range (as a r/o memslot), those writes (from host userspace) tend to end up in dcache. But, when the guest flips back the flash to romd mode, and tries to read back the values from the flash as plain ROM, the dcache is completely bypassed due to the strict stage1 mapping, and the guest goes directly to DRAM. Where qemu's earlier writes are not yet / necessarily visible. Please see my original patch (which was incomplete) in the attachment, it has a very verbose commit message. Anyway, I'll let others explain; they can word it better than I can :) FWIW, Series Reviewed-by: Laszlo Ersek <lersek@xxxxxxxxxx> I ported this series to a 3.17.0+ based kernel, and tested it. It works fine. The ROM-like view of the NOR flash now reflects the previously programmed contents. Series Tested-by: Laszlo Ersek <lersek@xxxxxxxxxx> Thanks! Laszlo
>From a2b4da9b03f03ccdb8b0988a5cc64d1967f00398 Mon Sep 17 00:00:00 2001 From: Laszlo Ersek <lersek@xxxxxxxxxx> Date: Sun, 16 Nov 2014 01:43:11 +0100 Subject: [PATCH] arm, arm64: KVM: clean cache on page fault also when IPA is uncached (WIP) This patch builds on Marc Zyngier's commit 2d58b733c87689d3d5144e4ac94ea861cc729145. (1) The guest bypasses the cache *not only* when the VCPU's dcache is disabled (see bit 0 and bit 2 in SCTLR_EL1, "MMU enable" and "Cache enable", respectively -- vcpu_has_cache_enabled()). The guest bypasses the cache *also* when the Stage 1 memory attributes say "device memory" about the Intermediate Page Address in question, independently of the Stage 2 memory attributes. Refer to: Table D5-38 Combining the stage 1 and stage 2 memory type assignments in the ARM ARM. (This is likely similar to MTRRs on x86.) (2) In edk2 (EFI Development Kit II), the ARM NOR flash driver, ArmPlatformPkg/Drivers/NorFlashDxe/NorFlashFvbDxe.c uses the AddMemorySpace() and SetMemorySpaceAttributes() Global Coherency Domain Services of DXE (Driver eXecution Environment) to *justifiedly* set the attributes of the guest memory covering the flash chip to EFI_MEMORY_UC ("uncached"). According to the AArch64 bindings for UEFI (see "2.3.6.1 Memory types" in the UEFI-2.4A specification), EFI_MEMORY_UC is mapped to: ARM Memory Type: MAIR attribute encoding ARM Memory Type: EFI Memory Type Attr<n> [7:4] [3:0] Meaning --------------- ----------------------- ------------------------ EFI_MEMORY_UC 0000 0000 Device-nGnRnE (Device (Not cacheable) non-Gathering, non-Reordering, no Early Write Acknowledgement) This is correctly implemented in edk2, in the ArmConfigureMmu() function, via the ArmSetMAIR() call and the MAIR_ATTR() macro: The TT_ATTR_INDX_DEVICE_MEMORY (== 0) memory attribute index, which is used for EFI_MEMORY_UC memory, is associated with the MAIR_ATTR_DEVICE_MEMORY (== 0x00, see above) memory attribute value, in the MAIR_ELx register. As a consequence of (1) and (2), when edk2 code running in the guest accesses an IPA falling in the flash range, it will completely bypass the cache. Therefore, when such a page is faulted in in user_mem_abort(), we must flush the data cache; otherwise the guest will see stale data in the flash chip. This patch is not complete because I have no clue how to calculate the memory attribute for "fault_ipa" in user_mem_abort(). Right now I set "fault_ipa_uncached" to constant true, which might incur some performance penalty for data faults, but it certainly improves correctness -- the ArmVirtualizationPkg platform build of edk2 actually boots as a KVM guest on APM Mustang. Signed-off-by: Laszlo Ersek <lersek@xxxxxxxxxx> --- arch/arm/include/asm/kvm_mmu.h | 5 +++-- arch/arm64/include/asm/kvm_mmu.h | 5 +++-- arch/arm/kvm/mmu.c | 10 ++++++++-- 3 files changed, 14 insertions(+), 6 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index acb0d57..f867060 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -161,9 +161,10 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu) } static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva, - unsigned long size) + unsigned long size, + bool ipa_uncached) { - if (!vcpu_has_cache_enabled(vcpu)) + if (!vcpu_has_cache_enabled(vcpu) || ipa_uncached) kvm_flush_dcache_to_poc((void *)hva, size); /* diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 0caf7a5..123b521 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -243,9 +243,10 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu) } static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva, - unsigned long size) + unsigned long size, + bool ipa_uncached) { - if (!vcpu_has_cache_enabled(vcpu)) + if (!vcpu_has_cache_enabled(vcpu) || ipa_uncached) kvm_flush_dcache_to_poc((void *)hva, size); if (!icache_is_aliasing()) { /* PIPT */ diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 57a403a..e2323b7 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -847,6 +847,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct vm_area_struct *vma; pfn_t pfn; pgprot_t mem_type = PAGE_S2; + bool fault_ipa_uncached; write_fault = kvm_is_write_fault(vcpu); if (fault_status == FSC_PERM && !write_fault) { @@ -913,6 +914,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (!hugetlb && !force_pte) hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa); + /* TBD */ + fault_ipa_uncached = true; + if (hugetlb) { pmd_t new_pmd = pfn_pmd(pfn, mem_type); new_pmd = pmd_mkhuge(new_pmd); @@ -920,7 +924,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, kvm_set_s2pmd_writable(&new_pmd); kvm_set_pfn_dirty(pfn); } - coherent_cache_guest_page(vcpu, hva & PMD_MASK, PMD_SIZE); + coherent_cache_guest_page(vcpu, hva & PMD_MASK, PMD_SIZE, + fault_ipa_uncached); ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd); } else { pte_t new_pte = pfn_pte(pfn, mem_type); @@ -928,7 +933,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, kvm_set_s2pte_writable(&new_pte); kvm_set_pfn_dirty(pfn); } - coherent_cache_guest_page(vcpu, hva, PAGE_SIZE); + coherent_cache_guest_page(vcpu, hva, PAGE_SIZE, + fault_ipa_uncached); ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, pgprot_val(mem_type) == pgprot_val(PAGE_S2_DEVICE)); } -- 1.8.3.1