RE: [RFC] Make the memory failure blast radius more precise

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Both the RFC patch and the above 5-step recovery plan look neat, step 4) 
> is nice to carry forward on icelake when a single instruction to clear
> poison is available.

Jane,

Clearing poison has some challenges.

On persistent memory it probably works (as the DIMM is going to remap that address to a different
part of the media to avoid the bad spot).

On DDR memory you'd need to decide whether the problem was transient, so that a simple
overwrite fixes the problem. Or persistent ... in which case the problem will likely come back
with the right data pattern.  To tell that you may need to run some memory test on the affected
area.

If the error was just in a 4K page, I'd be inclined to copy the good data to a new page and
map that in instead. Throwing away one 4K page isn't likely to be painful.

If it is in a 2M/1G page ... perhaps it is worth the effort and risk of trying to clear the poison
in place to avoid the pain of breaking up a large page.

-Tony




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux