RE: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations

"Luck, Tony" <tony.luck@xxxxxxxxx> · Tue, 30 Jun 2015 18:12:35 +0000

> Sounds logical. In that case, bootmem awareness would be crucial.
> Enabling support in just the page allocator is too late.

Andrew already applied some patches from me that I think covered bootmem
mirror allocations:

commit fc6daaf93151877748f8096af6b3fddb147f22d6
    mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute
commit a3f5bafcc04aaf62990e0cf3ced1cc6d8dc6fe95
    mm/memblock: allocate boot time data structures from mirrored memory
commit b05b9f5f9dcf593a0e9327676b78e6c17b4218e8
    x86, mirror: x86 enabling - find mirrored memory ranges

If I missed something, please let me know.

>> In that sense 'protecting' all kernel allocations is natural: we don't know how to 
>> recover from faults that affect kernel memory.
>> 
>
> It potentially uses all mirrored memory on memory that does not need that
> sort of guarantee. For example, if there was a MC on memory backing the
> inode cache then potentially that is recoverable as long as the inodes
> were not dirty.

Right now this is hard to do.  On Intel we get a broadcast machine check that
may catch bystander cpus holding locks that we might need to look at kernel
structures to make decisions on what we just lost.  That may get easier with
local machine check (only the logical cpu that tried to consume the corrupt
data gets the machine check ... patches for Linux are in for basic support of
this ... waiting for h/w that does it).

> That's a minor detail as the kernel could later protect
> only MIGRATE_UNMOVABLE requests instead of all kernel allocations if fatal
> MC in kernel space could be distinguished from non-fatal checks.

So the immediate use case is large memory servers (hundred+ Gbytes to
TBytes) running some applications that use most of memory in user mode
(like a database).  We mirror enough memory to cover *all* the kernel allocations
so that a bad memory access with be fixed from the mirror for kernel, or result
in SIGBUS to a process for user page ... either way we don't crash the system.

Perhaps in the future we might find some places in the kernel where we can
cover a lot of memory without too many code changes ... e.g. things like
pagecopy().  At that time we'd have to think about allocation priorities.

-Tony

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href