On 11/09/14 16:26, Ard Biesheuvel wrote: > On 11 September 2014 16:52, Marc Zyngier <marc.zyngier@xxxxxxx> wrote: >> On 11/09/14 15:41, Ard Biesheuvel wrote: >>> Hello all, >>> >>> I spent most of the day chasing a particularly weird heisenbug in the >>> QEMU+KVM+UEFI combo. >>> The symptom was that UEFI init would hang on the first write to the >>> second NOR flash (to initialize the variable store) but *only* when >>> using the -bios option (instead of -pflash) and a boot image of >>> exactly 64 MB in size. Note that this implies that the second NOR >>> flash was not file backed. >>> >>> As it turns out, the choice of the -bios option and the size of the >>> file affect whether KVM ends up using sections or pages to back the >>> NOR flash, and in my failure case, it was using the latter. That >>> resulted in KVM going down a code path where the memory backing the >>> NOR was writable, which breaks the MMIO emulation, and resulted in the >>> hang on init of the variable store. >>> >>> The patch below fixes it for me. >>> >>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >>> index c68ec28f17c3..121abc6fef97 100644 >>> --- a/arch/arm/kvm/mmu.c >>> +++ b/arch/arm/kvm/mmu.c >>> @@ -817,7 +817,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >>> phys_addr_t fault_ipa, >>> pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable); >>> if (is_error_pfn(pfn)) >>> return -EFAULT; >>> >>> - if (kvm_is_mmio_pfn(pfn)) >>> + if (writable && kvm_is_mmio_pfn(pfn)) >>> mem_type = PAGE_S2_DEVICE; >>> >>> spin_lock(&kvm->mmu_lock); >>> >>> Here is the definition of kvm_is_mmio() for completeness. I am a bit >>> out of my depth here, so perhaps someone else can shed some light on >>> this? >>> >>> bool kvm_is_mmio_pfn(pfn_t pfn) >>> { >>> if (pfn_valid(pfn)) >>> return PageReserved(pfn_to_page(pfn)); >>> >>> return true; >>> } >>> >>> To me, it is particularly puzzling what PageReserved() has to do with >>> anything, as I couldn't find any other uses of it under kvm/ >>> >> >> My understanding is that kvm_is_mmio_pfn() is used for *devices* that >> are mapped directly mapped (think device assignment). PageReserved() >> would make sense there. >> >> Now, I'm not familiar with the whole QEMU setup, so maybe you could >> describe how things are mapped, and what is supposed to happen? >> > > When running QEMU using the -bios <file> option, the file is exposed > to the guest as an emulated NOR flash at 0x0, so that you can boot > from it directly. > In my case, the NOR is used for the boot image itself, and for a > non-volatile variable store at 0x400_0000, which is initialized by > UEFI when it boots. > > In order for the NOR emulation to work, writes to the NOR need to > trap, so that QEMU can take down the whole memory region, trapping all > reads and writes, until a command is issued that puts it back into > array mode, and the memory region is created again. In this mode, the > guest reads to the NOR go straight to host RAM. (Or they should: the > patch you merged today fixed and issue where reads were mistaken for > writes and sent to QEMU instead) > > So when UEFI enters it non-volatile variable store driver, it first > issues a read to the base of the second half of the NOR (0x400_0000), > resulting in some host RAM to be pinned to back it up. However, for > some reason, it is mapped as PAGE_S2_DEVICE, which is read-write, so > when subsequently a write is issued to kick the NOR into command mode, > it just gets sent to host RAM as well. > > Indeed, by the looks of it, kvm_is_mmio_pfn() is intended to map host > device memory straight into the guest physical address space, but this > is not what I am doing, so why is it returning 'true' here? That's the bit I do not understand. Any chance you can find out who sets this bit in the page tables? Thanks, M. -- Jazz is not dead. It just smells funny... _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm