Re: [PATCH] KVM: PPC: Book3S PR: Enable use on POWER9 inside HPT-mode guests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 24 May 2018 09:12:09 +1000
Paul Mackerras <paulus@xxxxxxxxxx> wrote:

> On Wed, May 23, 2018 at 07:04:21PM +0200, Greg Kurz wrote:
> > On Sat, 19 May 2018 15:56:38 +1000
> > Paul Mackerras <paulus@xxxxxxxxxx> wrote:
> >   
> > > This relaxes the restriction on using PR KVM on POWER9.  The existing
> > > code does work inside a guest partition running in HPT mode, because
> > > hypercalls such as H_ENTER use the old HPTE format, not the new
> > > format used by POWER9, and so no change to PR KVM's HPT manipulation
> > > code is required.  PR KVM will still refuse to run if the kernel is
> > > using radix translation or if it is running bare-metal.
> > > 
> > > Signed-off-by: Paul Mackerras <paulus@xxxxxxxxxx>
> > > ---  
> > 
> > Paul,
> > 
> > I have built a 4.16.0 kernel + this patch and booted the L1 guest
> > with "disable_radix=on". I could then successfully boot a L2 guest,
> > using the same kernel for simplicity. Both guests using identical
> > fedora28 images. So it seems to be working at first sight.
> > 
> > 
> > But, if I boot the L2 guest with the default fedora28 kernel, ie
> > 4.16.9-300.fc28.ppc64le, the L2 guest hangs.
> > 
> > OF stdout device is: /vdevice/vty@71000000
> > Preparing to boot Linux version 4.16.9-300.fc28.ppc64le (mockbuild@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) (gcc version 8.1.1 20180502 (Red Hat 8.1.1-1) (GCC)) #1 SMP Thu May 17 04:31:32 UTC 2018
> > Detected machine type: 0000000000000101
> > command line: BOOT_IMAGE=/boot/vmlinuz-4.16.9-300.fc28.ppc64le root=UUID=22128c5c-30b1-4e0a-ac16-95853df31131 ro rhgb console=hvc0 early_printk LANG=en_US.UTF-8
> > Max number of cores passed to firmware: 1024 (NR_CPUS = 1024)
> > Calling ibm,client-architecture-support... done
> > memory layout at init:
> >   memory_limit : 0000000000000000 (16 MB aligned)
> >   alloc_bottom : 0000000004e70000
> >   alloc_top    : 0000000030000000
> >   alloc_top_hi : 0000000100000000
> >   rmo_top      : 0000000030000000
> >   ram_top      : 0000000100000000
> > instantiating rtas at 0x000000002fff0000... done
> > prom_hold_cpus: skipped
> > copying OF device tree...
> > Building dt strings...
> > Building dt structure...
> > Device tree strings 0x0000000004e80000 -> 0x0000000004e80aaf
> > Device tree struct  0x0000000004e90000 -> 0x0000000004ea0000
> > Quiescing Open Firmware ...
> > Booting Linux via __start() @ 0x0000000002000000 ...
> > 
> > (qemu) p $pc
> > 0xc000000000026aa0
> > (qemu) p $lr
> > 0xc000000000119ff4
> > 
> > # addr2line -e /usr/lib/debug/lib/modules/4.16.9-300.fc28.ppc64le/vmlinux 0xc000000000026aa0
> > /usr/src/debug/kernel-4.16.fc28/linux-4.16.9-300.fc28.ppc64le/./arch/powerpc/include/asm/time.h:115
> > 
> > # addr2line -e /usr/lib/debug/lib/modules/4.16.9-300.fc28.ppc64le/vmlinux 0xc000000000119ff4
> > /usr/src/debug/kernel-4.16.fc28/linux-4.16.9-300.fc28.ppc64le/kernel/panic.c:300
> > 
> > ie, the final mdelay(PANIC_TIMER_STEP) in panic().
> > 
> > Not sure how to debug this further, any suggestion is welcome :)  
> 
> I suggest you find the address of log_buf from System.map, read that
> via the qemu command line (log_buf is a pointer), then dump the memory
> it points to, so you can see the panic message.
> 

Hi Paul,

Thanks for your suggestion.

I could reproduced the problem if I boot the L2 guest with an upstream
kernel (commit d7b66b4ab034). I've tried to dump the log_buf but things
didn't go well:

$ grep 'd log_buf' System.map 
c000000001304f08 d log_buf_len
c000000001304f10 d log_buf

(qemu) x 0xc000000001304f08
c000000001304f08: Cannot access memory

Since 4.16.0 works, I could bisect down to:

commit dbfcf3cb9c681aa0c5d0bb46068f98d5b1823dd3
Author: Paul Mackerras <paulus@xxxxxxxxxx>
Date:   Thu Feb 16 16:03:39 2017 +1100

    powerpc/64: Call H_REGISTER_PROC_TBL when running as a HPT guest on POWER9

The hcall is handled by QEMU, which then calls the KVM_PPC_CONFIGURE_V3_MMU
ioctl, which fails since PR KVM doesn't implement it, and H_REGISTER_PROC_TBL
fails with H_PARAMETER. The panic hence come from...

static int pseries_lpar_register_process_table(unsigned long base,
			unsigned long page_size, unsigned long table_size)
{
	.
	.
	.
	for (;;) {
		rc = plpar_hcall_norets(H_REGISTER_PROC_TBL, flags, base,
					page_size, table_size);
		if (!H_IS_LONG_BUSY(rc))
			break;
		mdelay(get_longbusy_msecs(rc));
	}
	if (rc != H_SUCCESS) {
		pr_err("Failed to register process table (rc=%ld)\n", rc);
		BUG();
		^^^
		here.

The changelog of commit dbfcf3cb9c68 reads:

" If the hypervisor is able to support both radix and HPT guests, it would
  be entitled to defer allocation of the HPT until the H_REGISTER_PROC_TBL
  call"

But in our case, the hypervisor is QEMU/PR KVM in a L1 guest booted with radix
disabled. It is hence not "entitled to defer allocation of the HPT", and QEMU
allocates one during initial machine reset.

If I patch QEMU to make H_REGISTER_PROC_TBL a nop when KVM_CAP_PPC_MMU_RADIX
returns 0, then the L2 kernel boots like a charm.

So I'm wondering if the guest should even call H_REGISTER_PROC_TBL in this
case, since there's nothing to do ? 

Also, peeking into PAPR, I see that H_REGISTER_PROC_TBL is mandatory only "If
the platform supports the In-Memory Table Translation Option", which isn't
the case here. This is supposed to be advertised through the "hcall-imtt"
function set in the OF property "ibm,hypertas-functions" in the /rtas node.

I guess a correct behavior would be for QEMU to advertise "hcall-imtt"
when it supports both radix and hash, and the kernel should only call
H_REGISTER_PROC_TBL if it is available.

Of course, neither QEMU, nor the kernel seem to care about "hcall-imtt" today...
so I guess the easier way is to fix H_REGISTER_PROC_TBL in QEMU.

> Another thing to try would be to do the same test on a POWER8.
> 

No surprise, it continues to work on a POWER8, since:

               /*
                * On POWER9, we need to do a H_REGISTER_PROC_TBL hcall
                * to inform the hypervisor that we wish to use the HPT.
                */
               if (cpu_has_feature(CPU_FTR_ARCH_300))
                       register_process_table(0, 0, 0);

> Paul.

Cheers,

--
Greg



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux