Re: [PATCH 18/38] KVM: PPC: E500: Implement MMU notifiers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 15.08.2012, at 03:20, Scott Wood wrote:

> On 08/14/2012 06:04 PM, Alexander Graf wrote:
>> The e500 target has lived without mmu notifiers ever since it got
>> introduced, but fails for the user space check on them with hugetlbfs.
>> 
>> So in order to get that one working, implement mmu notifiers in a
>> reasonably dumb fashion and be happy. On embedded hardware, we almost
>> never end up with mmu notifier calls, since most people don't overcommit.
>> 
>> Signed-off-by: Alexander Graf <agraf@xxxxxxx>
>> ---
>> arch/powerpc/include/asm/kvm_host.h |    3 +-
>> arch/powerpc/include/asm/kvm_ppc.h  |    1 +
>> arch/powerpc/kvm/Kconfig            |    2 +
>> arch/powerpc/kvm/booke.c            |    6 +++
>> arch/powerpc/kvm/e500_tlb.c         |   60 +++++++++++++++++++++++++++++++---
>> 5 files changed, 65 insertions(+), 7 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
>> index a29e091..ff8d51c 100644
>> --- a/arch/powerpc/include/asm/kvm_host.h
>> +++ b/arch/powerpc/include/asm/kvm_host.h
>> @@ -45,7 +45,8 @@
>> #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>> #endif
>> 
>> -#ifdef CONFIG_KVM_BOOK3S_64_HV
>> +#if defined(CONFIG_KVM_BOOK3S_64_HV) || defined(CONFIG_KVM_E500V2) || \
>> +    defined(CONFIG_KVM_E500MC)
>> #include <linux/mmu_notifier.h>
>> 
>> #define KVM_ARCH_WANT_MMU_NOTIFIER
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
>> index 0124937..c38e824 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -104,6 +104,7 @@ extern void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
>>                                        struct kvm_interrupt *irq);
>> extern void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu,
>>                                          struct kvm_interrupt *irq);
>> +extern void kvmppc_core_flush_tlb(struct kvm_vcpu *vcpu);
>> 
>> extern int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
>>                                   unsigned int op, int *advance);
>> diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
>> index f4dacb9..40cad8c 100644
>> --- a/arch/powerpc/kvm/Kconfig
>> +++ b/arch/powerpc/kvm/Kconfig
>> @@ -123,6 +123,7 @@ config KVM_E500V2
>> 	depends on EXPERIMENTAL && E500 && !PPC_E500MC
>> 	select KVM
>> 	select KVM_MMIO
>> +	select MMU_NOTIFIER
>> 	---help---
>> 	  Support running unmodified E500 guest kernels in virtual machines on
>> 	  E500v2 host processors.
>> @@ -138,6 +139,7 @@ config KVM_E500MC
>> 	select KVM
>> 	select KVM_MMIO
>> 	select KVM_BOOKE_HV
>> +	select MMU_NOTIFIER
>> 	---help---
>> 	  Support running unmodified E500MC/E5500 (32-bit) guest kernels in
>> 	  virtual machines on E500MC/E5500 host processors.
>> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
>> index 70a86c0..52f6cbb 100644
>> --- a/arch/powerpc/kvm/booke.c
>> +++ b/arch/powerpc/kvm/booke.c
>> @@ -459,6 +459,10 @@ static void kvmppc_check_requests(struct kvm_vcpu *vcpu)
>> 	if (vcpu->requests) {
>> 		if (kvm_check_request(KVM_REQ_PENDING_TIMER, vcpu))
>> 			update_timer_ints(vcpu);
>> +#if defined(CONFIG_KVM_E500V2) || defined(CONFIG_KVM_E500MC)
>> +		if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
>> +			kvmppc_core_flush_tlb(vcpu);
>> +#endif
>> 	}
>> }
> 
> Can we define a new symbol that means "e500_tlb.c is used"?  Or just say
> that all new TLB implementations shall support MMU notifiers, and change
> this to ifndef 4xx.  Or make this a tlb flush callback with a no-op stub
> in 4xx.

I'd for the "all TLB implementations shall support MMU notifiers". Period. Including 440. The current state is only interim until we either

  * fix 440 or
  * remove 440

I do have a 440 machine here that I'm trying to get the current code to run on, but I don't have forever to spend on this and something seems odd. For some reason we seem to be jumping into the data section. Maybe you can spot something from the logs below?

If it wasn't for the overall brokenness of the target, I would've quickly implemented MMU Notifiers for 440 already.

root@ppc440:~/qemu# ./ppc-softmmu/qemu-system-ppc -M bamboo -kernel /boot/uImage.autotest -nographic -enable-kvm -s
Using PowerPC 44x Platform machine description
Linux version 3.6.0-rc1-00221-g7542507 (agraf@wolfberry-1) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #20 Wed Aug 15 01:46:42 CEST 2012
bootconsole [udbg0] enabled
setup_arch: bootmem
arch: exit
Zone ranges:
  DMA      [mem 0x00000000-0x07ffffff]
  Normal   empty
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x00000000-0x07ffffff]
MMU: Allocated 1088 bytes of context maps for 255 contexts
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32512
Kernel command line: 
PID hash table entries: 512 (order: -1, 2048 bytes)
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Memory: 125352k/131072k available (4052k kernel code, 5720k reserved, 172k data, 244k bss, 220k init)
Kernel virtual memory layout:
  * 0xfffdf000..0xfffff000  : fixmap
  * 0xfde00000..0xfe000000  : consistent mem
  * 0xfddfe000..0xfde00000  : early ioremap
  * 0xd1000000..0xfddfe000  : vmalloc & ioremap
SLUB: Genslabs=13, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
NR_IRQS:512 nr_irqs:512 16
KVM: unknown exit, hardware reason ffffffff7c030306
NIP c03c57ac   LR c03c56ec CTR c03198b4 XER 00000000
MSR 00021002 HID0 00000000  HF 00000000 idx 1
TB 00000195 2403482311 DECR 00000000
GPR00 0000000000000000 00000000c0415f60 00000000c03f82e0 00000000000000c2
GPR04 0000000000000000 0000000000000000 00000000000000c0 00000000c054f0c0
GPR08 0000000000000000 0000000020000000 0000000000000001 00000000c03fe408
GPR12 0000000022000022 0000000000000000 0000000000000000 0000000000000000
GPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20 0000000000000000 0000000000000000 00000000c0000010 0000000000000000
GPR24 0000000000000000 0000000000000000 00000000c03e3720 00000000c0420538
GPR28 00000000c0374280 00000000c0415f68 00000000c0420000 00000000c7800300
CR 42000028  [ G  E  -  -  -  -  E  L  ]             RES ffffffff
FPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPSCR 00000000
 SRR0 c000e754  SRR1 00021002    PVR 422218d3 VRSAVE 00000000
SPRG0 c0420000 SPRG1 00000000  SPRG2 00000000  SPRG3 c03f84c0
SPRG4 44000022 SPRG5 00000000  SPRG6 00000000  SPRG7 82000028
CSRR0 00000000 CSRR1 00000000 MCSRR0 00000000 MCSRR1 00000000
  TCR 00000000   TSR 00000000    ESR 00800000   DEAR fddfe303
  PIR 00000000 DECAR 00000000   IVPR c0000000   EPCR 00000000
 MCSR 00000000 SPRG8 00000000    EPR 00000000
 MCAR 00000000  PID1 00000000   PID2 00000000    SVR 00000000

[...]
 qemu-system-ppc-1895  [000] ....   423.529387: kvm_exit: exit=ITLB_MISS | pc=0xc00b1480 | msr=0x21002 | dar=0xc044ba70 | last_inst=0x7f400106
 qemu-system-ppc-1895  [000] ....   423.529394: kvm_stlb_inval: stlb_index 50
 qemu-system-ppc-1895  [000] ....   423.529395: kvm_stlb_write: victim 50 tid 1 w0 3221951248 w1 307912704 w2 575
 qemu-system-ppc-1895  [000] ....   423.529399: kvm_exit: exit=DTLB_MISS | pc=0xc00e47e0 | msr=0x21002 | dar=0xc7802284 | last_inst=0x801a0004
 qemu-system-ppc-1895  [000] ....   423.529405: kvm_stlb_inval: stlb_index 51
 qemu-system-ppc-1895  [000] ....   423.529406: kvm_stlb_write: victim 51 tid 1 w0 3347063568 w1 296411136 w2 575
 qemu-system-ppc-1895  [000] ....   423.529410: kvm_exit: exit=DTLB_MISS | pc=0xc00e48bc | msr=0x21002 | dar=0xc0420200 | last_inst=0x812b0200
 qemu-system-ppc-1895  [000] ....   423.529417: kvm_stlb_inval: stlb_index 52
 qemu-system-ppc-1895  [000] ....   423.529418: kvm_stlb_write: victim 52 tid 1 w0 3225551632 w1 301301760 w2 575
 qemu-system-ppc-1895  [000] ....   423.529421: kvm_exit: exit=DTLB_MISS | pc=0xc00e4910 | msr=0x21002 | dar=0xc7805000 | last_inst=0x7fbb012e
 qemu-system-ppc-1895  [000] ....   423.529450: kvm_stlb_inval: stlb_index 53
 qemu-system-ppc-1895  [000] ....   423.529451: kvm_stlb_write: victim 53 tid 1 w0 3347075856 w1 296632320 w2 575
 qemu-system-ppc-1895  [000] ....   423.529457: kvm_exit: exit=DTLB_MISS | pc=0xc00e5204 | msr=0x21002 | dar=0xc0561058 | last_inst=0x809d0008
 qemu-system-ppc-1895  [000] ....   423.529464: kvm_stlb_inval: stlb_index 54
 qemu-system-ppc-1895  [000] ....   423.529465: kvm_stlb_write: victim 54 tid 1 w0 3226866448 w1 295124992 w2 575
 qemu-system-ppc-1895  [000] ....   423.529468: kvm_exit: exit=PROGRAM | pc=0xc00e5138 | msr=0x21002 | dar=0xc0561058 | last_inst=0x7f400106
 qemu-system-ppc-1895  [000] ....   423.529471: kvm_ppc_instr: inst 2134900998 pc 0xc00e5138 emulate 0

 qemu-system-ppc-1895  [000] ....   423.529475: kvm_exit: exit=ITLB_MISS | pc=0xc03199d8 | msr=0x21002 | dar=0xc0561058 | last_inst=0x7f400106
 qemu-system-ppc-1895  [000] ....   423.529482: kvm_stlb_inval: stlb_index 55
 qemu-system-ppc-1895  [000] ....   423.529483: kvm_stlb_write: victim 55 tid 1 w0 3224474384 w1 301121536 w2 575
 qemu-system-ppc-1895  [000] ....   423.529488: kvm_exit: exit=PROGRAM | pc=0xc03c57ac | msr=0x21002 | dar=0xc0561058 | last_inst=0x7c030306
 qemu-system-ppc-1895  [000] .N..   423.534502: kvm_booke_queue_irqprio: vcpu=0 prio=3 pending=0
 qemu-system-ppc-1895  [000] .N..   423.534505: kvm_ppc_instr: inst 2080572166 pc 0xc03c57ac emulate 3

 qemu-system-ppc-1895  [000] .N..   423.539462: kvm_booke_queue_irqprio: vcpu=0 prio=3 pending=8
 qemu-system-ppc-1895  [000] .N..   423.539468: kvm_userspace_exit: reason KVM_EXIT_UNKNOWN (0)

(gdb) x /i 0xc03c57ac
   0xc03c57ac <__kvmppc_vcpu_run+3789496>:	mtdcrx  r3,r0

Unfortunately, TLB synchronization between KVM and QEMU isn't implemented for 440, so bt in gdb doesn't work. Sigh.

> Of course this isn't critical and shouldn't hold up the pull request
> since I got around to the review late -- just something to think about
> for further refactoring.
> 
>> @@ -579,6 +583,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
>> #endif
>> 
>> 	kvm_guest_exit();
>> +	vcpu->mode = OUTSIDE_GUEST_MODE;
>> +	smp_wmb();
>> 
>> out:
>> 	vcpu->mode = OUTSIDE_GUEST_MODE;
> 
> This looks wrong.

Yeah. A later patch fixes it up again :(. Sorry.

> 
>> diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
>> index 93f3b92..06273a7 100644
>> --- a/arch/powerpc/kvm/e500_tlb.c
>> +++ b/arch/powerpc/kvm/e500_tlb.c
>> @@ -303,18 +303,15 @@ static inline void kvmppc_e500_ref_setup(struct tlbe_ref *ref,
>> 	ref->pfn = pfn;
>> 	ref->flags = E500_TLB_VALID;
>> 
>> -	if (tlbe_is_writable(gtlbe))
>> +	if (tlbe_is_writable(gtlbe)) {
>> 		ref->flags |= E500_TLB_DIRTY;
>> +		kvm_set_pfn_dirty(pfn);
>> +	}
>> }
> 
> Is there any reason to keep E500_TLB_DIRTY around?  You seem to be
> removing the only code that checks it.

Nope, none. We can safely remove it.

> 
>> @@ -357,6 +354,13 @@ static void clear_tlb_refs(struct kvmppc_vcpu_e500 *vcpu_e500)
>> 	clear_tlb_privs(vcpu_e500);
>> }
>> 
>> +void kvmppc_core_flush_tlb(struct kvm_vcpu *vcpu)
>> +{
>> +	struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
>> +	clear_tlb_refs(vcpu_e500);
>> +	clear_tlb1_bitmap(vcpu_e500);
>> +}
> 
> Should we move clear_tlb1_bitmap() into clear_tlb_refs()?  That is, is
> it a bug that we don't do it in ioctl_dirty_tlb()?

I think so, yes :).

> 
>> +/************* MMU Notifiers *************/
>> +
> 
> Is this really necessary?

The comment? No, comments are never necessary. But the code is slowly becoming bigger and bigger and having individual sections in it improves readability IMHO.

> 
>> +int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
>> +{
>> +	/*
>> +	 * Flush all shadow tlb entries everywhere. This is slow, but
>> +	 * we are 100% sure that we catch the to be unmapped page
>> +	 */
>> +	kvm_flush_remote_tlbs(kvm);
>> +
>> +	return 0;
>> +}
>> +
>> +int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end)
>> +{
>> +	/* kvm_unmap_hva flushes everything anyways */
>> +	kvm_unmap_hva(kvm, start);
>> +
>> +	return 0;
>> +}
> 
> I'd feel better about this calling kvm_flush_remote_tlbs() directly
> rather than hoping that someone who enhances kvm_unmap_hva() updates
> this function as well.

The reasons I went for this approach were:

  * originally I called a generic helper that would get a mask and an HVA, but it turned out to be useless because it's easier to flush everything for now
  * kvm_unmap_hva gets a trace point. This way we get traces for all unmap operations.

I would say chances are very low that someone changes kvm_unmap_hva without also touching kvm_unmap_hva_range.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux