On 11/10/2014 12:03, Ralf Baechle wrote: > On Mon, Nov 10, 2014 at 08:55:09AM -0800, David Daney wrote: > >> Yes, you may be on to something here. Certianly basic huge TLB support must >> be in place for TRANSPARENT_HUGEPAGE to work. >> >> It could be that the Kconfig symbols for the various portions of huge page >> support are missing the required dependencies. >> >> FWIW, I always build with a huge page Kconfig options set. >> >> I have: >> $ grep HUGE .config >> CONFIG_SYS_SUPPORTS_HUGETLBFS=y >> CONFIG_MIPS_HUGE_TLB_SUPPORT=y >> CONFIG_CPU_SUPPORTS_HUGEPAGES=y >> CONFIG_TRANSPARENT_HUGEPAGE=y >> CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y >> # CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set >> CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y >> CONFIG_HUGETLBFS=y >> CONFIG_HUGETLB_PAGE=y >> >> I suspect that you may not need CONFIG_HUGETLBFS, but CONFIG_HUGETLB_PAGE is >> probably essential. > > IP27 also has NUMA as the only in-tree MIPS system - and it's NUMA support > is not in the best support state to say the least. Just an observation - > at this point in time there is no obvious connection between either > > R10000 <-> transparent huge page > > or > > NUMA <-> transparent huge page > > Ralf I briefly tried NUMA on the Onyx2, and it failed to load init. init actually spat out its --help info and quit, which panicked the kernel. So I didn't test that too much more. I am also booting an 'M' kernel, not an 'N'. That said, I went back to playing around with the Octane, which also seems to have issues when CONFIG_TRANSPARENT_HUGEPAGE is present. I now think that it's not hugepages support at all, but something in the code covered by CONFIG_MIGRATION. Booting a 3.17.2 kernel on the Octane with CONFIG_TRANSPARENT_HUGEPAGES but without CONFIG_HUGETLBFS (and, consequently, without CONFIG_HUGETLB_PAGE), didn't immediately trigger my instruction bus errors upon loading init, despite multiple cold reboots. It took several tries before I could get 3.17.2 to trigger it. Backtracking to 3.16, I found out that I could trigger the problem virtually every single cold boot on 3.16.4, but NOT 3.16.5. Going through 3.16.5's changelog, I tried backing out several commits that dealt with transparent hugepages, jiffies calculation, and finally hit on this one: http://git.linux-mips.org/?p=ralf/linux.git;a=commit;h=e9203e7b4019370e6d8f69cbf71c052aad22ced7 """ commit d3cb8bf6081b8b7a2dabb1264fe968fd870fa595 upstream. A migration entry is marked as write if pte_write was true at the time the entry was created. The VMA protections are not double checked when migration entries are being removed as mprotect marks write-migration-entries as read. It means that potentially we take a spurious fault to mark PTEs write again but it's straight-forward. However, there is a race between write migrations being marked read and migrations finishing. This potentially allows a PTE to be write that should have been read. Close this race by double checking the VMA permissions using maybe_mkwrite when migration completes. """ CONFIG_MIGRATION is enabled by default when you select CONFIG_TRANSPARENT_HUGEPAGE, and when I backed that patch out of 3.16.5, the frequency of a cold boot resulting in IBE's upon loading init increased -- 6 out of 7 reboots in one test run. Leaving that patch backed out, I enabled CONFIG_HUGETLBFS and CONFIG_HUGETLB_PAGE, and so far, out of five cold boots, all boot up fine. This mirrors the behavior on the IP27 machine where CONFIG_HUGETLBFS seems to fix problems. I tried backing the migration patch out on the IP27 kernel and it doesn't seem to have an effect there. This seems to suggest that CONFIG_MIGRATION plays a part somehow, but only if CONFIG_HUGETLB_PAGE is left out. Doesn't look like CONFIG_HUGETLBFS matters, as I haven't mounted that filesystem anywhere. The symptoms on each systems are different -- I only get IBE's on Octane, sometimes mixed with DBE's, and usually when init loads. If by luck, init loads, the IBE's are not likely to happen and the machine seems to run fine. I also confirmed that the R12K module on Octane suffers the same problem -- seems to be a bit more resilient, though. IP27 only ever gets DBE's, and not usually while loading init, but when executing other userland programs, like Gentoo's emerge (written in Python). It also looks like turning on CONFIG_HUGETLBFS and CONFIG_HUGETLB_PAGE fixed my problems on Octane w/ PAGE_SIZE_16K/PAGE_SIZE_64K triggering random sigbus/sigsegv signals, too (if anyone remembers that mail thread form a few months ago). So I'm curious why CONFIG_HUGETLB_PAGE is hidden and selected only with CONFIG_HUGETLBFS? It does cause arch/mips/mm/hugetlbpage.c to get built, so maybe that's the critical part? If so, it seems then for MIPS, that should be in the the 'Kernel type' menu w/ CONFIG_TRANSPARENT_HUGEPAGE, and not invisibly hidden away deep the 'File systems' submenu. --J