Re: IP27: CONFIG_TRANSPARENT_HUGEPAGE triggers bus errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/08/2014 19:09, Joshua Kinard wrote:
> On 11/07/2014 13:30, David Daney wrote:
>> On 11/07/2014 02:22 AM, Joshua Kinard wrote:
>> [...]
>>>
>>> So my guess is unless hugepages can happen in powers of 4,
>>
>> Huge  pages are currently only supported on MIPS64 for this reason.
>>
>> huge_page_mask_size = (normal_page_size/8 * normal_page_size) / 2;
>>
>> If you take log2 of everything you get
>>
>> huge_page_mask_bits = normal_page_bits - 3 + normal_page_bits - 1
>>   = 2 * normal_page_bits - 4 (always even)
>>
>> So all page sizes result in huge pages that meet the power of 4 criterion.
> 
> Well, looks like I'll have to bisect to hunt the problem down.  Obviously there
> is something with transparent hugepages that the R10K-family dislikes.  Just a
> question of "what?".  Seems like I'm the only one left with this kind of
> equipment and interest to play with it :)

I gave up on bisecting this.  3.7 and 3.9 kernels are not bootable on my Onyx2
w/o additional patches to fix the PCI probing code to deal with the card cage I
have in my system (basically, it stops probing after it discovers the first PCI
bus).  Even with that fixed, normal init refused to load on those kernels, and
dash as init just outright crashed.  Must be some other IP27 bug that was fixed
at some point, and I didn't feel like applying multiple patches to every bisect
checkout, which might've altered results and led me to blaming the wrong commit.

It does look like the PageMask register is getting set to the correct values on
PAGE_SIZE_4K and PAGE_SIZE_16K when a hugepage is needed (PM_1M and PM_16M).
The PAGE_SIZE_64K case wouldn't be valid on R10k, as that uses PM_256M for a
hugepage, which is bits 28:13 in PageMask and that would lead to "undefined
behavior".  I'm assuming another register is getting set to an incorrect value
in the huge pagecase (EntryLo0 or EntryLo1?  EntryHi?), but I don't have the
required knowledge to fiddle w/ the TLB code to figure it out.

So, I sent in the patch that marks CPU_SUPPORTS_HUGEPAGES as BROKEN until
someone feels like tackling it (if ever).

Sidenote: Is it possible to add additional CP0 registers to a register dump on
a panic or oops?  I looked around ptrace.c and ptrace.h and see where these
registers are setup and printed out, but I can't find out where the actual
values are fetched from the CPU and put into struct pt_regs.  I am assuming
it's a snippet of asm somewhere.  Adding R10K's PageMask, Config, ErrorEpc, And
Context/XContext registers seems like useful debugging info.

-- 
Joshua Kinard
Gentoo/MIPS
kumba@xxxxxxxxxx
4096R/D25D95E3 2011-03-28

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic





[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux