On 11/10/2014 05:51, Ralf Baechle wrote: > Thomas, > > can you test CONFIG_TRANSPARENT_HUGEPAGE on an IP28? > > All in all the R10000's TLB is unproblematic; my gut feeling is that > rather something else specific to IP27 is spoiling the broth. > > Ralf I don't know if it's specific to IP27. I have problems on the Octane w/ an R14000 and CONFIG_TRANSPARENT_HUGEPAGE (instruction bus errors, needs cold reboot to clear). I didn't have the same issues w/ the R12000 CPU module installed, but I did not test things as thoroughly the last time I installed it. I'll see about swapping the R12K module back in tonight or tomorrow and doing the same tests as on the IP27 that can trigger problems. --J > On Mon, Nov 10, 2014 at 02:04:10AM -0500, Joshua Kinard wrote: >> Date: Mon, 10 Nov 2014 02:04:10 -0500 >> From: Joshua Kinard <kumba@xxxxxxxxxx> >> To: David Daney <ddaney.cavm@xxxxxxxxx> >> CC: Ralf Baechle <ralf@xxxxxxxxxxxxxx>, Linux MIPS List >> <linux-mips@xxxxxxxxxxxxxx> >> Subject: Re: IP27: CONFIG_TRANSPARENT_HUGEPAGE triggers bus errors >> Content-Type: text/plain; charset=windows-1252 >> >> On 11/08/2014 19:09, Joshua Kinard wrote: >>> On 11/07/2014 13:30, David Daney wrote: >>>> On 11/07/2014 02:22 AM, Joshua Kinard wrote: >>>> [...] >>>>> >>>>> So my guess is unless hugepages can happen in powers of 4, >>>> >>>> Huge pages are currently only supported on MIPS64 for this reason. >>>> >>>> huge_page_mask_size = (normal_page_size/8 * normal_page_size) / 2; >>>> >>>> If you take log2 of everything you get >>>> >>>> huge_page_mask_bits = normal_page_bits - 3 + normal_page_bits - 1 >>>> = 2 * normal_page_bits - 4 (always even) >>>> >>>> So all page sizes result in huge pages that meet the power of 4 criterion. >>> >>> Well, looks like I'll have to bisect to hunt the problem down. Obviously there >>> is something with transparent hugepages that the R10K-family dislikes. Just a >>> question of "what?". Seems like I'm the only one left with this kind of >>> equipment and interest to play with it :) >> >> I gave up on bisecting this. 3.7 and 3.9 kernels are not bootable on my Onyx2 >> w/o additional patches to fix the PCI probing code to deal with the card cage I >> have in my system (basically, it stops probing after it discovers the first PCI >> bus). Even with that fixed, normal init refused to load on those kernels, and >> dash as init just outright crashed. Must be some other IP27 bug that was fixed >> at some point, and I didn't feel like applying multiple patches to every bisect >> checkout, which might've altered results and led me to blaming the wrong commit. >> >> It does look like the PageMask register is getting set to the correct values on >> PAGE_SIZE_4K and PAGE_SIZE_16K when a hugepage is needed (PM_1M and PM_16M). >> The PAGE_SIZE_64K case wouldn't be valid on R10k, as that uses PM_256M for a >> hugepage, which is bits 28:13 in PageMask and that would lead to "undefined >> behavior". I'm assuming another register is getting set to an incorrect value >> in the huge pagecase (EntryLo0 or EntryLo1? EntryHi?), but I don't have the >> required knowledge to fiddle w/ the TLB code to figure it out. >> >> So, I sent in the patch that marks CPU_SUPPORTS_HUGEPAGES as BROKEN until >> someone feels like tackling it (if ever). >> >> Sidenote: Is it possible to add additional CP0 registers to a register dump on >> a panic or oops? I looked around ptrace.c and ptrace.h and see where these >> registers are setup and printed out, but I can't find out where the actual >> values are fetched from the CPU and put into struct pt_regs. I am assuming >> it's a snippet of asm somewhere. Adding R10K's PageMask, Config, ErrorEpc, And >> Context/XContext registers seems like useful debugging info.