On 07/05/2016 11:10, Ralf Baechle wrote: > On Thu, Jun 16, 2016 at 03:50:31PM -0700, David Daney wrote: > >> From: David Daney <david.daney@xxxxxxxxxx> >> >> When the core THP code is modifying the permissions of a huge page it >> calls pmd_modify(), which unfortunately was clearing the _PAGE_HUGE bit >> of the page table entry. The result can be kernel messages like: >> >> mm/memory.c:397: bad pmd 000000040080004d. >> mm/memory.c:397: bad pmd 00000003ff00004d. >> mm/memory.c:397: bad pmd 000000040100004d. >> >> or: >> >> ------------[ cut here ]------------ >> WARNING: at mm/mmap.c:3200 exit_mmap+0x150/0x158() >> Modules linked in: ipv6 at24 octeon3_ethernet octeon_srio_nexus m25p80 >> CPU: 12 PID: 1295 Comm: pmderr Not tainted 3.10.87-rt80-Cavium-Octeon #4 >> Stack : 0000000040808000 0000000014009ce1 0000000000400004 ffffffff81076ba0 >> 0000000000000000 0000000000000000 ffffffff85110000 0000000000000119 >> 0000000000000004 0000000000000000 0000000000000119 43617669756d2d4f >> 0000000000000000 ffffffff850fda40 ffffffff85110000 0000000000000000 >> 0000000000000000 0000000000000009 ffffffff809207a0 0000000000000c80 >> ffffffff80f1bf20 0000000000000001 000000ffeca36828 0000000000000001 >> 0000000000000000 0000000000000001 000000ffeca7e700 ffffffff80886924 >> 80000003fd7a0000 80000003fd7a39b0 80000003fdea8000 ffffffff80885780 >> 80000003fdea8000 ffffffff80f12218 000000000000000c 000000000000050f >> 0000000000000000 ffffffff80865c4c 0000000000000000 0000000000000000 >> ... >> Call Trace: >> [<ffffffff80865c4c>] show_stack+0x6c/0xf8 >> [<ffffffff80885780>] warn_slowpath_common+0x78/0xa8 >> [<ffffffff809207a0>] exit_mmap+0x150/0x158 >> [<ffffffff80882d44>] mmput+0x5c/0x110 >> [<ffffffff8088b450>] do_exit+0x230/0xa68 >> [<ffffffff8088be34>] do_group_exit+0x54/0x1d0 >> [<ffffffff8088bfc0>] __wake_up_parent+0x0/0x18 >> >> ---[ end trace c7b38293191c57dc ]--- >> BUG: Bad rss-counter state mm:80000003fa168000 idx:1 val:1536 >> >> Fix by not clearing _PAGE_HUGE bit. > > I resolved the conflict with my recent other fix for pmd_modify > and just applied and pushed this. > > Ralf Eh, it looks like I've stumbled into another odd corner case of THP. Only affects the SGI Octane so far. So it might be an Octane bug, but I'm at a loss to explain why/how. If I have THP/HugeTLBFS enabled, BUT disable only CONFIG_CPU_IDLE_GOV_LADDER (while keeping the Menu governor and basic idle support in), then on userland boot, there's about a 1-in-2 chance it'll start to throw instruction bus errors. If I keep the ladder governor compiled in, no bus errors. The other way to trigger it, regardless of the above condition, is to modify arch/mips/kernel/idle.c and force the R1x000 CPU's to use 'r4k_wait' for cpu_wait. Compile and run that, and virtually an IBE on every boot. If I disable THP/HugeTLBFS, then with either of the conditions above, the system appears to boot fine. I honestly have no idea if the R10000-family of CPUs even supports the 'wait' instruction, as I can't find any solid documentation (except for one vague NEC reference) that suggests otherwise, but I am not seeing any illegal instruction issues arising out of its use, unless the R10k treats it as a nop or such. That said, THP does appear to work now on both IP27 and IP30. IP27 seems to run it fine w/o the CPU idle framework at all. Doesn't hit very often in /proc/vmstat, though. Thoughts? -- Joshua Kinard Gentoo/MIPS kumba@xxxxxxxxxx 6144R/F5C6C943 2015-04-27 177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943 "The past tempts us, the present confuses us, the future frightens us. And our lives slip away, moment by moment, lost in that vast, terrible in-between." --Emperor Turhan, Centauri Republic -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html