On Fri, 8 May 2015 09:44:21 -0700 Tony Luck <tony.luck@xxxxxxxxx> wrote: > Some high end Intel Xeon systems report uncorrectable memory errors > as a recoverable machine check. Linux has included code for some time > to process these and just signal the affected processes (or even > recover completely if the error was in a read only page that can be > replaced by reading from disk). > > But we have no recovery path for errors encountered during kernel > code execution. Except for some very specific cases were are unlikely > to ever be able to recover. > > Enter memory mirroring. Actually 3rd generation of memory mirroing. > > Gen1: All memory is mirrored > Pro: No s/w enabling - h/w just gets good data from other side of the mirror > Con: Halves effective memory capacity available to OS/applications > Gen2: Partial memory mirror - just mirror memory begind some memory controllers > Pro: Keep more of the capacity > Con: Nightmare to enable. Have to choose between allocating from > mirrored memory for safety vs. NUMA local memory for performance > Gen3: Address range partial memory mirror - some mirror on each memory controller > Pro: Can tune the amount of mirror and keep NUMA performance > Con: I have to write memory management code to implement > > The current plan is just to use mirrored memory for kernel allocations. This > has been broken into two phases: > 1) This patch series - find the mirrored memory, use it for boot time allocations > 2) Wade into mm/page_alloc.c and define a ZONE_MIRROR to pick up the unused > mirrored memory from mm/memblock.c and only give it out to select kernel > allocations (this is still being scoped because page_alloc.c is scary). Looks good to me. What happens to these patches while ZONE_MIRROR is being worked on? I'm wondering about phase II. What does "select kernel allocations" mean? I assume we can't say "all kernel allocations" because that can sometimes be "almost all memory". How are you planning on implementing this? A new __GFP_foo flag, then sprinkle that into selected sites? Will surplus ZONE_MIRROR memory be available for regular old movable allocations? I suggest you run the design ideas by Mel before getting into implementation. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>