Re: Book3s_hv KVM HTAB bug

Paul Mackerras <paulus@xxxxxxxxx> · Fri, 14 Jun 2013 09:20:26 +1000

On Thu, Jun 13, 2013 at 02:34:56PM +0200, Alexander Graf wrote:
> Hi Paul,
> 
> We've just seen another KVM bug with 3.8 on p7. It looks as if for some reason a bolted HTAB entry for the kernel got evicted.
> 
...
> (gdb) x /i 0xc000000000005d00
>    0xc000000000005d00 <instruction_access_common>:	andi.   r10,r12,16384
> (qemu) xp /i 0x5d00
>    0x0000000000005d00:  andi.   r10,r12,16384
> (qemu) info tlb
>    SLB    ESID                    VSID
>    3      0xc000000008000000      0x0000c00838795000
> 
> So for some reason QEMU can still resolve the virtual address using the guest HTAB, but the the CPU can not. Otherwise the guest wouldn't get a 0x400 when accessing that page.

When I've seen this sort of thing it has usually been that we failed
to insert a HPTE in htab_bolt_mapping(), called from
htab_initialize().  When that happens we BUG_ON(), which is stupid
because it causes a program interrupt, and the first thing we do is
turn the MMU on, but we don't have a linear mapping set up, so we
start taking continual instruction storage interrupts (because the ISI
handler also wants to turn on interrupts).  Ben has an idea to fix
that, which is to have IR and DR off in paca->kernel_msr until we're
ready to turn the MMU on.  That might help debuggability in the case
you're hitting, whether or not it's htab_bolt_mapping failing.

Are you *absolutely* sure that QEMU is using the guest HTAB to
translate the 0xc... addresses?  If it is actually doing so it would
need to be using the relatively new KVM_PPC_GET_HTAB_FD ioctl, and I
thought the only place that was used was in the migration code.

To debug this sort of thing, what I usually do is patch the guest
kernel to put a branch to self at 0x400.  Then when it hangs you have
some chance of sorting out what happened using info registers etc.

I would be very interested to know how big a HPT the host kernel
allocated for the guest and what was in it.  The host kernel prints a
message telling you the size and location of the HPT, and in this sort
of situation I find it helpful to take a copy of it with dd and dump
it with hexdump.

Also, what page size are you using in the host kernel?  If it's 4k,
then the guest kernel is limited to using 4k pages for the linear
mapping, which can mean it runs out of space in the HPT for the linear
mapping more easily.

Since you don't have my patch to add a flexible allocator for the HPT
and RMA areas (you rejected it, if you recall), you'll be limited to
what you can allocate from the page allocator, which is usually 16MB,
but may be less if free memory is low and/or fragmented.  16MB should
be enough for a 3GB guest, particularly if you're using 64k pages in
the host, but if the host was only able to allocate a much smaller
HPT, that might explain the problem.

Let me know if you discover anything further...

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html