Thomas, can you test CONFIG_TRANSPARENT_HUGEPAGE on an IP28? All in all the R10000's TLB is unproblematic; my gut feeling is that rather something else specific to IP27 is spoiling the broth. Ralf On Mon, Nov 10, 2014 at 02:04:10AM -0500, Joshua Kinard wrote: > Date: Mon, 10 Nov 2014 02:04:10 -0500 > From: Joshua Kinard <kumba@xxxxxxxxxx> > To: David Daney <ddaney.cavm@xxxxxxxxx> > CC: Ralf Baechle <ralf@xxxxxxxxxxxxxx>, Linux MIPS List > <linux-mips@xxxxxxxxxxxxxx> > Subject: Re: IP27: CONFIG_TRANSPARENT_HUGEPAGE triggers bus errors > Content-Type: text/plain; charset=windows-1252 > > On 11/08/2014 19:09, Joshua Kinard wrote: > > On 11/07/2014 13:30, David Daney wrote: > >> On 11/07/2014 02:22 AM, Joshua Kinard wrote: > >> [...] > >>> > >>> So my guess is unless hugepages can happen in powers of 4, > >> > >> Huge pages are currently only supported on MIPS64 for this reason. > >> > >> huge_page_mask_size = (normal_page_size/8 * normal_page_size) / 2; > >> > >> If you take log2 of everything you get > >> > >> huge_page_mask_bits = normal_page_bits - 3 + normal_page_bits - 1 > >> = 2 * normal_page_bits - 4 (always even) > >> > >> So all page sizes result in huge pages that meet the power of 4 criterion. > > > > Well, looks like I'll have to bisect to hunt the problem down. Obviously there > > is something with transparent hugepages that the R10K-family dislikes. Just a > > question of "what?". Seems like I'm the only one left with this kind of > > equipment and interest to play with it :) > > I gave up on bisecting this. 3.7 and 3.9 kernels are not bootable on my Onyx2 > w/o additional patches to fix the PCI probing code to deal with the card cage I > have in my system (basically, it stops probing after it discovers the first PCI > bus). Even with that fixed, normal init refused to load on those kernels, and > dash as init just outright crashed. Must be some other IP27 bug that was fixed > at some point, and I didn't feel like applying multiple patches to every bisect > checkout, which might've altered results and led me to blaming the wrong commit. > > It does look like the PageMask register is getting set to the correct values on > PAGE_SIZE_4K and PAGE_SIZE_16K when a hugepage is needed (PM_1M and PM_16M). > The PAGE_SIZE_64K case wouldn't be valid on R10k, as that uses PM_256M for a > hugepage, which is bits 28:13 in PageMask and that would lead to "undefined > behavior". I'm assuming another register is getting set to an incorrect value > in the huge pagecase (EntryLo0 or EntryLo1? EntryHi?), but I don't have the > required knowledge to fiddle w/ the TLB code to figure it out. > > So, I sent in the patch that marks CPU_SUPPORTS_HUGEPAGES as BROKEN until > someone feels like tackling it (if ever). > > Sidenote: Is it possible to add additional CP0 registers to a register dump on > a panic or oops? I looked around ptrace.c and ptrace.h and see where these > registers are setup and printed out, but I can't find out where the actual > values are fetched from the CPU and put into struct pt_regs. I am assuming > it's a snippet of asm somewhere. Adding R10K's PageMask, Config, ErrorEpc, And > Context/XContext registers seems like useful debugging info. > > -- > Joshua Kinard > Gentoo/MIPS > kumba@xxxxxxxxxx > 4096R/D25D95E3 2011-03-28 > > "The past tempts us, the present confuses us, the future frightens us. And our > lives slip away, moment by moment, lost in that vast, terrible in-between." > > --Emperor Turhan, Centauri Republic Ralf