Re: IP27: CONFIG_TRANSPARENT_HUGEPAGE triggers bus errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/10/2014 12:03, Ralf Baechle wrote:
> On Mon, Nov 10, 2014 at 08:55:09AM -0800, David Daney wrote:
> 
>> Yes, you may be on to something here.  Certianly basic huge TLB support must
>> be in place for TRANSPARENT_HUGEPAGE to work.
>>
>> It could be that the Kconfig symbols for the various portions of huge page
>> support are missing the required dependencies.
>>
>> FWIW, I always build with a huge page Kconfig options set.
>>
>> I have:
>> $ grep HUGE .config
>> CONFIG_SYS_SUPPORTS_HUGETLBFS=y
>> CONFIG_MIPS_HUGE_TLB_SUPPORT=y
>> CONFIG_CPU_SUPPORTS_HUGEPAGES=y
>> CONFIG_TRANSPARENT_HUGEPAGE=y
>> CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
>> # CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
>> CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
>> CONFIG_HUGETLBFS=y
>> CONFIG_HUGETLB_PAGE=y
>>
>> I suspect that you may not need CONFIG_HUGETLBFS, but CONFIG_HUGETLB_PAGE is
>> probably essential.
> 
> IP27 also has NUMA as the only in-tree MIPS system - and it's NUMA support
> is not in the best support state to say the least.  Just an observation -
> at this point in time there is no obvious connection between either
> 
>   R10000 <-> transparent huge page
> 
> or
> 
>   NUMA <-> transparent huge page
> 
>   Ralf

I briefly tried NUMA on the Onyx2, and it failed to load init.  init actually
spat out its --help info and quit, which panicked the kernel.  So I didn't test
that too much more.  I am also booting an 'M' kernel, not an 'N'.

That said, I went back to playing around with the Octane, which also seems to
have issues when CONFIG_TRANSPARENT_HUGEPAGE is present.  I now think that it's
not hugepages support at all, but something in the code covered by
CONFIG_MIGRATION.

Booting a 3.17.2 kernel on the Octane with CONFIG_TRANSPARENT_HUGEPAGES but
without CONFIG_HUGETLBFS (and, consequently, without CONFIG_HUGETLB_PAGE),
didn't immediately trigger my instruction bus errors upon loading init, despite
multiple cold reboots.  It took several tries before I could get 3.17.2 to
trigger it.

Backtracking to 3.16, I found out that I could trigger the problem virtually
every single cold boot on 3.16.4, but NOT 3.16.5.  Going through 3.16.5's
changelog, I tried backing out several commits that dealt with transparent
hugepages, jiffies calculation, and finally hit on this one:
http://git.linux-mips.org/?p=ralf/linux.git;a=commit;h=e9203e7b4019370e6d8f69cbf71c052aad22ced7

"""
commit d3cb8bf6081b8b7a2dabb1264fe968fd870fa595 upstream.

A migration entry is marked as write if pte_write was true at the time the
entry was created. The VMA protections are not double checked when migration
entries are being removed as mprotect marks write-migration-entries as
read. It means that potentially we take a spurious fault to mark PTEs write
again but it's straight-forward. However, there is a race between write
migrations being marked read and migrations finishing. This potentially
allows a PTE to be write that should have been read. Close this race by
double checking the VMA permissions using maybe_mkwrite when migration
completes.
"""

CONFIG_MIGRATION is enabled by default when you select
CONFIG_TRANSPARENT_HUGEPAGE, and when I backed that patch out of 3.16.5, the
frequency of a cold boot resulting in IBE's upon loading init increased -- 6
out of 7 reboots in one test run.

Leaving that patch backed out, I enabled CONFIG_HUGETLBFS and
CONFIG_HUGETLB_PAGE, and so far, out of five cold boots, all boot up fine.
This mirrors the behavior on the IP27 machine where CONFIG_HUGETLBFS seems to
fix problems.  I tried backing the migration patch out on the IP27 kernel and
it doesn't seem to have an effect there.

This seems to suggest that CONFIG_MIGRATION plays a part somehow, but only if
CONFIG_HUGETLB_PAGE is left out.  Doesn't look like CONFIG_HUGETLBFS matters,
as I haven't mounted that filesystem anywhere.

The symptoms on each systems are different -- I only get IBE's on Octane,
sometimes mixed with DBE's, and usually when init loads.  If by luck, init
loads, the IBE's are not likely to happen and the machine seems to run fine.  I
also confirmed that the R12K module on Octane suffers the same problem -- seems
to be a bit more resilient, though.

IP27 only ever gets DBE's, and not usually while loading init, but when
executing other userland programs, like Gentoo's emerge (written in Python).

It also looks like turning on CONFIG_HUGETLBFS and CONFIG_HUGETLB_PAGE fixed my
problems on Octane w/ PAGE_SIZE_16K/PAGE_SIZE_64K triggering random
sigbus/sigsegv signals, too (if anyone remembers that mail thread form a few
months ago).

So I'm curious why CONFIG_HUGETLB_PAGE is hidden and selected only with
CONFIG_HUGETLBFS?  It does cause arch/mips/mm/hugetlbpage.c to get built, so
maybe that's the critical part?  If so, it seems then for MIPS, that should be
in the the 'Kernel type' menu w/ CONFIG_TRANSPARENT_HUGEPAGE, and not invisibly
hidden away deep the 'File systems' submenu.

--J





[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux