RE: [PATCH v2 0/3] support for broken memory modules (BadRAM)

"Luck, Tony" <tony.luck@xxxxxxxxx> · Fri, 24 Jun 2011 09:40:51 -0700

> > I am very curious about your findings.  Independently of those, I am in
> > favour of a patch that enables longer e820 tables if it has no further
> > impact on speed or space.
> > 
>
> That is already in the mainline kernel, although only if fed from the
> boot loader (it was developed in the context of mega-NUMA machines); the
> stub fetching from INT 15h doesn't use this at the moment.

Does it scale?  Current X86 systems go up to about 2TB - presumably
in the form of 256 8GB DIMMs (or maybe 512 4GB ones).  If a faulty
row or column on a DIMM can give rise to 4K bad pages, then these
large systems could conceivably have 1-2 million bad pages (while
still being quite usable - a loss of 4-8G from a 2TB system is down
in the noise).  Can we handle a 2 million entry e820 table? Do we
want to?

Perhaps we may end up with a composite solution. Use e820 to map out
the bad pages below some limit (like 4GB). Preferably in the boot loader
so it can find a range of good memory to load the kernel. Then use
badRAM patterns for addresses over 4GB for Linux to avoid bad pages
by flagging their page structures.

-Tony

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href