Re: THP broken on OCTEON?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/23/2016 15:22, Ralf Baechle wrote:
> On Mon, May 23, 2016 at 02:57:30PM -0400, Joshua Kinard wrote:
> 
>> NAK, this issue looks completely different to IP30/IP27.  In this case, it
>> looks like the hardware is detecting the case where multiple TLB entries match
>> and it's killing the machine to avoid hardware damage.  I don't want to know
>> how the SGI systems handle this scenario (does the R10000 do a TLB shutdown??).
> 
> The R10000 detects if duplicate entries when writing to the TLB and
> invalidates the previous entry.  That is, there will never be duplicate
> entries in the TLB and of course no TLB shutdown.
> 
> That's the theory.  I'm wondering how well that is going to work if
> the entries are having a different page size.
> 
> And Aaro doesn't always get machine checks so it's not like always a
> duplicate entry is written.
> 
>> On IP30, using THP usually results in instruction bus errors (IBE), after a set
>> time, depending on the machine's configuration (<2GB RAM, virtually instant on
>> userland init; >2GB RAM, might survive for a few minutes, even getting all the
>> way to runlevel 3 randomly).
>>
>> IP27 was somewhat similar to IP30, in that THP usually results in IBEs after a
>> few seconds of hitting userland bringup (bash is pretty quick at triggering an
>> IBE), but I haven't tried experimenting with varying the amount of RAM in that
>> machine, due to the fragility of pulling the nodeboards out constantly.  I also
>> haven't tried THP since refactoring/rewriting the IP27 code back in Feb to see
>> if I magically fixed it...

For IP30, I created a BUGS file in my local source (also in the IP30 patch I
still maintain) that documented some combinations of settings that affected THP
on the platform.  Most importantly, using a different PAGE_SIZE than 4KB also
required setting MAX_ZONE_ORDER to a decent value, too, else on Octane, it'd
hit IBEs at soon as the kernel executed /sbin/init.  Also depended on the
amount of RAM in that system:

>>2GB RAM:
>  - In order to use more than 2GB RAM in IP30/Octane requires selecting
>    VERY specific values for certain Kconfig options.  Specifically,
>    the following options under the "Kernel type" submenu:
>      - PAGE_SIZE
>      - Maximum Zone Order
>      - Transparent Hugepages (THP)
> 
>    A table of the specific settings is below:
>     PAGE_SIZE | Zone Order | THP
>    -----------|------------|-----
>        4KB    | 11 to 13   |  N
>       16KB    | 12 Only    |  Y
>       64KB*   | 14 Only    |  Y
> 
>    Any other configuration of these three options will likely lead to
>    Instruction Bus Errors (IBEs) when the kernel loads userland up (when it
>    execve()'s /sbin/init).  Even then, however, the machine will still be
>    very unstable (depending on the operations it does).  Heavy disk I/O
>    still seems capable of crashing the machine due to either NULL pointer
>    dereferences, unhandled kernel unaligned accesses, or Instruction Bus
>    Errors.
> 
>    * Impact users cannot currently use an Impact board with 64KB PAGE_SIZE,
>      THP, and >2GB RAM.  This will trigger a NULL pointer deference in
>      impact_resize_kpool() (when called initially from impact_common_probe()
>      to set the initial 64KB kpool on pool '0') due to (possibly) vzalloc()
>      returning a NULL pointer when allocating kpool_virt[pool].
> 
>    * THP still has issues on R1x000 CPUs, so user beware.  YMMV.


Might try some of those combinations and see if things improve on the Octeon?
IP27 was equally affected by this, minus the bits about RAM and Impact Gfx.
turning off THP, IP30 can run 64KB PAGE_SIZE without issue (compiles of
packages is actually sped up quite significantly under >4KB PAGE_SIZE).

IP27 has a bug in it somewhere that causes an immediate Oops on 64KB PAGE_SIZE
that I haven't traced down yet (I have the Oops saved somewhere if needed).  So
I use 16KB on that system.

An O2 w/ an RM7000 has virtually no issues at all with 64KB or 16KB PAGE_SIZE
and THP, though it's been several months since I last booted my O2.

-- 
Joshua Kinard
Gentoo/MIPS
kumba@xxxxxxxxxx
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic




[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux