Hi, On Wed, Jun 22, 2016 at 03:05:05PM -0700, David Daney wrote: > This is caused by a config bug. > > For THP to work you must have both: > > CONFIG_TRANSPARENT_HUGEPAGE=y > and > CONFIG_HUGETLBFS=y Oh... I guess this is with MIPS only? > Please try testing with both of those set as well as applying: > > https://www.linux-mips.org/archives/linux-mips/2016-06/msg00397.html Works! Now the system is stable. EBH5600 built dozen of different packages without any issues and THP being used: root@localhost:~$ grep thp /proc/vmstat thp_fault_alloc 2271 thp_fault_fallback 0 thp_collapse_alloc 2049 thp_collapse_alloc_failed 0 thp_split_page 0 thp_split_page_failed 0 thp_deferred_split_page 3996 thp_split_pmd 186 thp_zero_page_alloc 0 thp_zero_page_alloc_failed 0 Thanks a lot, A. > I will look into either a Kconfig fix, or fixing the code that currently > depends on CONFIG_HUGETLBFS, but is needed for all huge pages. > > The faults I saw are caused by: > > #define pmd_huge(x) 0 > > In include/linux/hugetlb.h > > Really we need to replace all occurrences of pmd_huge() under arch/mips with > something like pte_huge(), but I don't know if that is sufficient. There > may be other things gated by CONFIG_HUGETLBFS that I didn't see. > > David. > > On 05/23/2016 08:13 AM, Aaro Koskinen wrote: > >Hi, > > > >I'm getting kernel crashes (see below) reliably when building Perl in > >parallel (make -j16) on OCTEON EBH5600 board (8 cores, 4 GB RAM) with > >Linux 4.6. > > > >It seems that CONFIG_TRANSPARENT_HUGEPAGE has something to do with the > >issue - disabling it makes build go through fine. > > > >Any ideas? > > > >A. > > > >[ 2457.467155] Got mcheck at 00000001200a82b4 > >[ 2457.479447] CPU: 6 PID: 15916 Comm: lib/unicore/mkt Not tainted 4.6.0-octeon-distro.git-v2.16-1-gfc3b10e-dirty-00001-g16a7aa0 #1 > >[ 2457.514121] task: 80000000eccf2b80 ti: 80000000ecda4000 task.ti: 80000000ecda4000 > >[ 2457.536551] $ 0 : 0000000000000000 3e000000105bc006 0000000000000000 ffffffff957e4728 > >[ 2457.560686] $ 4 : 00000000000000f2 0000000000000067 000000012015e8ab 00000000332295cf > >[ 2457.584822] $ 8 : 0000000000000000 0000000000000000 0000000000000001 0000000000000003 > >[ 2457.608957] $12 : 00000001204e04d8 0000000000000008 0000000000000001 ffffffffffffffff > >[ 2457.633093] $16 : 0000000120383d60 00000001203a3828 00000000332295cf 000000000000000b > >[ 2457.657228] $20 : 000000012015e8a0 0000000000000000 000000000000000c 0000000000000000 > >[ 2457.681363] $24 : 0000000000000010 00000001200a80e8 > >[ 2457.705496] $28 : 00000001201a0300 000000ffffda82a0 000000012019b9b8 0000000120039f5c > >[ 2457.729631] Hi : 0000000000000000 > >[ 2457.740341] Lo : 0000000000000008 > >[ 2457.751055] epc : 00000001200a82b4 0x1200a82b4 > >[ 2457.764891] ra : 0000000120039f5c 0x120039f5c > >[ 2457.778726] Status: 00308cf3 KX SX UX USER EXL IE > >[ 2457.793284] Cause : 00800060 (ExcCode 18) > >[ 2457.805296] PrId : 000d0409 (Cavium Octeon+) > >[ 2457.818350] Index : 80000000 > >[ 2457.827759] PageMask : 1fe000 > >[ 2457.836646] EntryHi : 00000001203820f4 > >[ 2457.848136] EntryLo0 : 00000000105b8006 > >[ 2457.859628] EntryLo1 : 00000000105bc006 > >[ 2457.871119] Wired : 0 > >[ 2457.878704] PageGrain: e0000000 > >[ 2457.888111] > >[ 2457.892573] Index: 25 pgmask=4kb va=00120456000 asid=f4 > >[ 2457.908256] [ri=0 xi=0 pa=000e47d3000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=000c31bc000 c=0 d=1 v=1 g=0] > >[ 2457.935230] Index: 26 pgmask=4kb va=001200a8000 asid=f4 > >[ 2457.950915] [ri=0 xi=0 pa=000e0e1c000 c=0 d=0 v=1 g=0] [ri=0 xi=0 pa=000c50ed000 c=0 d=0 v=1 g=0] > >[ 2457.977888] Index: 27 pgmask=4kb va=001203a2000 asid=f4 > >[ 2457.993574] [ri=0 xi=0 pa=00000000000 c=0 d=0 v=0 g=0] [ri=0 xi=1 pa=0009005a000 c=1 d=0 v=1 g=0] > >[ 2458.020548] > >[ 2458.025008] > >Code: de100000 1200001c 00000000 <de110008> 8e220000 1452fffa 00000000 8e220004 1453fff7 > >[ 2458.054470] Kernel panic - not syncing: Caught Machine Check exception - caused by multiple matching entries in the TLB. > >[ 2458.087614] ---[ end Kernel panic - not syncing: Caught Machine Check exception - caused by multiple matching entries in the TLB. > >[ 2458.122835] > >do_page_fault(): sending SIGSEGV to make for invalid write access to 0000000000000012[ 2458.149565] > >[ 2458.149565] do_page_fault(): sending SIGSEGV to miniperl for invalid write access to 0000000000000010epc = 0000000120089500 in miniperl[120000000+181000]ra = 00000001200c18a4 in miniperl[120000000+181000][ 2458.149590] > > > >[ 2458.212999] epc = 0000000120015400 in make[120000000+35000] > >[ 2458.229780] ra = 000000ffeca7f570 in[ 2458.240797] > > > >*** NMI Watchdog interrupt on Core 0x0 *** > > > >A. > > > > >