Re: Loongson 3 kernel crashes with PAGE_EXTENSION and PAGE_POISONING

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 22/04/17 00:13, Aurelien Jarno wrote:
> Hi all,
> 
> The Debian kernel recently enabled the PAGE_EXTENSION and PAGE_POISONING
> options. Unfortunately this causes a kernel crash very early during the
> boot on Loongson 3 machines:
[...]
> Adding page_poison=0 to the command line improves the things a bit, the
> kernel is able to boot, but crashes a bit later in different ways:
[...]
> Note that the malta and octeon flavours are not affected by this bug, so
> it looks like Loongson 3 specific. Any help to find the root cause would
> be appreciated.

I have investigated this a bit, although I haven't been able to get to
the very bottom of it.

The PAGE_EXTENSION option is a red herring. The bug actually occurs
whenever a very large amount of .bss is used by the kernel. The loongson
kernel allocates this .bss because it enables SPARSEMEM and
SPARSEMEM_STATIC which has this documented side effect:

(From mm/Kconfig SPARSEMEM_STATIC)
> # SPARSEMEM_EXTREME (which is the default) does some bootmem
> # allocations when memory_present() is called.  If this cannot
> # be done on your architecture, select this option.  However,
> # statically allocating the mem_section[] array can potentially
> # consume vast quantities of .bss, so be careful.

On Loongson about 16M of .bss is allocated for the mem_section array.
When PAGE_EXTENSION is enabled, this doubles to 32M.

Having a large .bss shifts the location where the dentry cache is
allocated from to a region containing 0x0B020000. When the Loongson
boots, something early on in the boot process writes to this physical
address and places some garbage there which later crashes the kernel
(since it's in the middle of the dentry cache). Unfortunately I have no
idea what might be causing this, but if I hack arch/mips/kernel/setup.c
to reserve 0x0B020000 - 0x0B040000 then the crashes disappear.

The second workaround I have is to enable NUMA and disable
SPARSEMEM_STATIC. This prevents the large .bss and I think it's safe
because loongson64 uses some alternative memory initialization code when
NUMA is enabled and only calls memory_present at the end. However, I'm
not sure if it works on multi-node Loongsons (like the 3B) since I don't
have any to test it on.

Hopefully that helps a little.

James




[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux