On 15.08.2012, at 03:20, Scott Wood wrote: > On 08/14/2012 06:04 PM, Alexander Graf wrote: >> The e500 target has lived without mmu notifiers ever since it got >> introduced, but fails for the user space check on them with hugetlbfs. >> >> So in order to get that one working, implement mmu notifiers in a >> reasonably dumb fashion and be happy. On embedded hardware, we almost >> never end up with mmu notifier calls, since most people don't overcommit. >> >> Signed-off-by: Alexander Graf <agraf@xxxxxxx> >> --- >> arch/powerpc/include/asm/kvm_host.h | 3 +- >> arch/powerpc/include/asm/kvm_ppc.h | 1 + >> arch/powerpc/kvm/Kconfig | 2 + >> arch/powerpc/kvm/booke.c | 6 +++ >> arch/powerpc/kvm/e500_tlb.c | 60 +++++++++++++++++++++++++++++++--- >> 5 files changed, 65 insertions(+), 7 deletions(-) >> >> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h >> index a29e091..ff8d51c 100644 >> --- a/arch/powerpc/include/asm/kvm_host.h >> +++ b/arch/powerpc/include/asm/kvm_host.h >> @@ -45,7 +45,8 @@ >> #define KVM_COALESCED_MMIO_PAGE_OFFSET 1 >> #endif >> >> -#ifdef CONFIG_KVM_BOOK3S_64_HV >> +#if defined(CONFIG_KVM_BOOK3S_64_HV) || defined(CONFIG_KVM_E500V2) || \ >> + defined(CONFIG_KVM_E500MC) >> #include <linux/mmu_notifier.h> >> >> #define KVM_ARCH_WANT_MMU_NOTIFIER >> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h >> index 0124937..c38e824 100644 >> --- a/arch/powerpc/include/asm/kvm_ppc.h >> +++ b/arch/powerpc/include/asm/kvm_ppc.h >> @@ -104,6 +104,7 @@ extern void kvmppc_core_queue_external(struct kvm_vcpu *vcpu, >> struct kvm_interrupt *irq); >> extern void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu, >> struct kvm_interrupt *irq); >> +extern void kvmppc_core_flush_tlb(struct kvm_vcpu *vcpu); >> >> extern int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, >> unsigned int op, int *advance); >> diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig >> index f4dacb9..40cad8c 100644 >> --- a/arch/powerpc/kvm/Kconfig >> +++ b/arch/powerpc/kvm/Kconfig >> @@ -123,6 +123,7 @@ config KVM_E500V2 >> depends on EXPERIMENTAL && E500 && !PPC_E500MC >> select KVM >> select KVM_MMIO >> + select MMU_NOTIFIER >> ---help--- >> Support running unmodified E500 guest kernels in virtual machines on >> E500v2 host processors. >> @@ -138,6 +139,7 @@ config KVM_E500MC >> select KVM >> select KVM_MMIO >> select KVM_BOOKE_HV >> + select MMU_NOTIFIER >> ---help--- >> Support running unmodified E500MC/E5500 (32-bit) guest kernels in >> virtual machines on E500MC/E5500 host processors. >> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c >> index 70a86c0..52f6cbb 100644 >> --- a/arch/powerpc/kvm/booke.c >> +++ b/arch/powerpc/kvm/booke.c >> @@ -459,6 +459,10 @@ static void kvmppc_check_requests(struct kvm_vcpu *vcpu) >> if (vcpu->requests) { >> if (kvm_check_request(KVM_REQ_PENDING_TIMER, vcpu)) >> update_timer_ints(vcpu); >> +#if defined(CONFIG_KVM_E500V2) || defined(CONFIG_KVM_E500MC) >> + if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) >> + kvmppc_core_flush_tlb(vcpu); >> +#endif >> } >> } > > Can we define a new symbol that means "e500_tlb.c is used"? Or just say > that all new TLB implementations shall support MMU notifiers, and change > this to ifndef 4xx. Or make this a tlb flush callback with a no-op stub > in 4xx. I'd for the "all TLB implementations shall support MMU notifiers". Period. Including 440. The current state is only interim until we either * fix 440 or * remove 440 I do have a 440 machine here that I'm trying to get the current code to run on, but I don't have forever to spend on this and something seems odd. For some reason we seem to be jumping into the data section. Maybe you can spot something from the logs below? If it wasn't for the overall brokenness of the target, I would've quickly implemented MMU Notifiers for 440 already. root@ppc440:~/qemu# ./ppc-softmmu/qemu-system-ppc -M bamboo -kernel /boot/uImage.autotest -nographic -enable-kvm -s Using PowerPC 44x Platform machine description Linux version 3.6.0-rc1-00221-g7542507 (agraf@wolfberry-1) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #20 Wed Aug 15 01:46:42 CEST 2012 bootconsole [udbg0] enabled setup_arch: bootmem arch: exit Zone ranges: DMA [mem 0x00000000-0x07ffffff] Normal empty Movable zone start for each node Early memory node ranges node 0: [mem 0x00000000-0x07ffffff] MMU: Allocated 1088 bytes of context maps for 255 contexts Built 1 zonelists in Zone order, mobility grouping on. Total pages: 32512 Kernel command line: PID hash table entries: 512 (order: -1, 2048 bytes) Dentry cache hash table entries: 16384 (order: 4, 65536 bytes) Inode-cache hash table entries: 8192 (order: 3, 32768 bytes) Memory: 125352k/131072k available (4052k kernel code, 5720k reserved, 172k data, 244k bss, 220k init) Kernel virtual memory layout: * 0xfffdf000..0xfffff000 : fixmap * 0xfde00000..0xfe000000 : consistent mem * 0xfddfe000..0xfde00000 : early ioremap * 0xd1000000..0xfddfe000 : vmalloc & ioremap SLUB: Genslabs=13, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 NR_IRQS:512 nr_irqs:512 16 KVM: unknown exit, hardware reason ffffffff7c030306 NIP c03c57ac LR c03c56ec CTR c03198b4 XER 00000000 MSR 00021002 HID0 00000000 HF 00000000 idx 1 TB 00000195 2403482311 DECR 00000000 GPR00 0000000000000000 00000000c0415f60 00000000c03f82e0 00000000000000c2 GPR04 0000000000000000 0000000000000000 00000000000000c0 00000000c054f0c0 GPR08 0000000000000000 0000000020000000 0000000000000001 00000000c03fe408 GPR12 0000000022000022 0000000000000000 0000000000000000 0000000000000000 GPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20 0000000000000000 0000000000000000 00000000c0000010 0000000000000000 GPR24 0000000000000000 0000000000000000 00000000c03e3720 00000000c0420538 GPR28 00000000c0374280 00000000c0415f68 00000000c0420000 00000000c7800300 CR 42000028 [ G E - - - - E L ] RES ffffffff FPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPSCR 00000000 SRR0 c000e754 SRR1 00021002 PVR 422218d3 VRSAVE 00000000 SPRG0 c0420000 SPRG1 00000000 SPRG2 00000000 SPRG3 c03f84c0 SPRG4 44000022 SPRG5 00000000 SPRG6 00000000 SPRG7 82000028 CSRR0 00000000 CSRR1 00000000 MCSRR0 00000000 MCSRR1 00000000 TCR 00000000 TSR 00000000 ESR 00800000 DEAR fddfe303 PIR 00000000 DECAR 00000000 IVPR c0000000 EPCR 00000000 MCSR 00000000 SPRG8 00000000 EPR 00000000 MCAR 00000000 PID1 00000000 PID2 00000000 SVR 00000000 [...] qemu-system-ppc-1895 [000] .... 423.529387: kvm_exit: exit=ITLB_MISS | pc=0xc00b1480 | msr=0x21002 | dar=0xc044ba70 | last_inst=0x7f400106 qemu-system-ppc-1895 [000] .... 423.529394: kvm_stlb_inval: stlb_index 50 qemu-system-ppc-1895 [000] .... 423.529395: kvm_stlb_write: victim 50 tid 1 w0 3221951248 w1 307912704 w2 575 qemu-system-ppc-1895 [000] .... 423.529399: kvm_exit: exit=DTLB_MISS | pc=0xc00e47e0 | msr=0x21002 | dar=0xc7802284 | last_inst=0x801a0004 qemu-system-ppc-1895 [000] .... 423.529405: kvm_stlb_inval: stlb_index 51 qemu-system-ppc-1895 [000] .... 423.529406: kvm_stlb_write: victim 51 tid 1 w0 3347063568 w1 296411136 w2 575 qemu-system-ppc-1895 [000] .... 423.529410: kvm_exit: exit=DTLB_MISS | pc=0xc00e48bc | msr=0x21002 | dar=0xc0420200 | last_inst=0x812b0200 qemu-system-ppc-1895 [000] .... 423.529417: kvm_stlb_inval: stlb_index 52 qemu-system-ppc-1895 [000] .... 423.529418: kvm_stlb_write: victim 52 tid 1 w0 3225551632 w1 301301760 w2 575 qemu-system-ppc-1895 [000] .... 423.529421: kvm_exit: exit=DTLB_MISS | pc=0xc00e4910 | msr=0x21002 | dar=0xc7805000 | last_inst=0x7fbb012e qemu-system-ppc-1895 [000] .... 423.529450: kvm_stlb_inval: stlb_index 53 qemu-system-ppc-1895 [000] .... 423.529451: kvm_stlb_write: victim 53 tid 1 w0 3347075856 w1 296632320 w2 575 qemu-system-ppc-1895 [000] .... 423.529457: kvm_exit: exit=DTLB_MISS | pc=0xc00e5204 | msr=0x21002 | dar=0xc0561058 | last_inst=0x809d0008 qemu-system-ppc-1895 [000] .... 423.529464: kvm_stlb_inval: stlb_index 54 qemu-system-ppc-1895 [000] .... 423.529465: kvm_stlb_write: victim 54 tid 1 w0 3226866448 w1 295124992 w2 575 qemu-system-ppc-1895 [000] .... 423.529468: kvm_exit: exit=PROGRAM | pc=0xc00e5138 | msr=0x21002 | dar=0xc0561058 | last_inst=0x7f400106 qemu-system-ppc-1895 [000] .... 423.529471: kvm_ppc_instr: inst 2134900998 pc 0xc00e5138 emulate 0 qemu-system-ppc-1895 [000] .... 423.529475: kvm_exit: exit=ITLB_MISS | pc=0xc03199d8 | msr=0x21002 | dar=0xc0561058 | last_inst=0x7f400106 qemu-system-ppc-1895 [000] .... 423.529482: kvm_stlb_inval: stlb_index 55 qemu-system-ppc-1895 [000] .... 423.529483: kvm_stlb_write: victim 55 tid 1 w0 3224474384 w1 301121536 w2 575 qemu-system-ppc-1895 [000] .... 423.529488: kvm_exit: exit=PROGRAM | pc=0xc03c57ac | msr=0x21002 | dar=0xc0561058 | last_inst=0x7c030306 qemu-system-ppc-1895 [000] .N.. 423.534502: kvm_booke_queue_irqprio: vcpu=0 prio=3 pending=0 qemu-system-ppc-1895 [000] .N.. 423.534505: kvm_ppc_instr: inst 2080572166 pc 0xc03c57ac emulate 3 qemu-system-ppc-1895 [000] .N.. 423.539462: kvm_booke_queue_irqprio: vcpu=0 prio=3 pending=8 qemu-system-ppc-1895 [000] .N.. 423.539468: kvm_userspace_exit: reason KVM_EXIT_UNKNOWN (0) (gdb) x /i 0xc03c57ac 0xc03c57ac <__kvmppc_vcpu_run+3789496>: mtdcrx r3,r0 Unfortunately, TLB synchronization between KVM and QEMU isn't implemented for 440, so bt in gdb doesn't work. Sigh. > Of course this isn't critical and shouldn't hold up the pull request > since I got around to the review late -- just something to think about > for further refactoring. > >> @@ -579,6 +583,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) >> #endif >> >> kvm_guest_exit(); >> + vcpu->mode = OUTSIDE_GUEST_MODE; >> + smp_wmb(); >> >> out: >> vcpu->mode = OUTSIDE_GUEST_MODE; > > This looks wrong. Yeah. A later patch fixes it up again :(. Sorry. > >> diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c >> index 93f3b92..06273a7 100644 >> --- a/arch/powerpc/kvm/e500_tlb.c >> +++ b/arch/powerpc/kvm/e500_tlb.c >> @@ -303,18 +303,15 @@ static inline void kvmppc_e500_ref_setup(struct tlbe_ref *ref, >> ref->pfn = pfn; >> ref->flags = E500_TLB_VALID; >> >> - if (tlbe_is_writable(gtlbe)) >> + if (tlbe_is_writable(gtlbe)) { >> ref->flags |= E500_TLB_DIRTY; >> + kvm_set_pfn_dirty(pfn); >> + } >> } > > Is there any reason to keep E500_TLB_DIRTY around? You seem to be > removing the only code that checks it. Nope, none. We can safely remove it. > >> @@ -357,6 +354,13 @@ static void clear_tlb_refs(struct kvmppc_vcpu_e500 *vcpu_e500) >> clear_tlb_privs(vcpu_e500); >> } >> >> +void kvmppc_core_flush_tlb(struct kvm_vcpu *vcpu) >> +{ >> + struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu); >> + clear_tlb_refs(vcpu_e500); >> + clear_tlb1_bitmap(vcpu_e500); >> +} > > Should we move clear_tlb1_bitmap() into clear_tlb_refs()? That is, is > it a bug that we don't do it in ioctl_dirty_tlb()? I think so, yes :). > >> +/************* MMU Notifiers *************/ >> + > > Is this really necessary? The comment? No, comments are never necessary. But the code is slowly becoming bigger and bigger and having individual sections in it improves readability IMHO. > >> +int kvm_unmap_hva(struct kvm *kvm, unsigned long hva) >> +{ >> + /* >> + * Flush all shadow tlb entries everywhere. This is slow, but >> + * we are 100% sure that we catch the to be unmapped page >> + */ >> + kvm_flush_remote_tlbs(kvm); >> + >> + return 0; >> +} >> + >> +int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end) >> +{ >> + /* kvm_unmap_hva flushes everything anyways */ >> + kvm_unmap_hva(kvm, start); >> + >> + return 0; >> +} > > I'd feel better about this calling kvm_flush_remote_tlbs() directly > rather than hoping that someone who enhances kvm_unmap_hva() updates > this function as well. The reasons I went for this approach were: * originally I called a generic helper that would get a mask and an HVA, but it turned out to be useless because it's easier to flush everything for now * kvm_unmap_hva gets a trace point. This way we get traces for all unmap operations. I would say chances are very low that someone changes kvm_unmap_hva without also touching kvm_unmap_hva_range. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html