Re: [PATCH 3/3] Add documentation and credits for BadRAM

Randy Dunlap <rdunlap@xxxxxxxxxxxx> · Tue, 24 May 2011 16:55:43 -0700

On Tue, 24 May 2011 13:20:48 +0200 Stefan Assmann wrote:

> Add Documentation/BadRAM.txt for in-depth information and update
> Documentation/kernel-parameters.txt.
> 
> Signed-off-by: Stefan Assmann <sassmann@xxxxxxxxx>
> ---
>  CREDITS                             |    9 +
>  Documentation/BadRAM.txt            |  370 +++++++++++++++++++++++++++++++++++
>  Documentation/kernel-parameters.txt |    6 +
>  3 files changed, 385 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/BadRAM.txt
> 
> diff --git a/CREDITS b/CREDITS
> index dca6abc..d57d4af 100644
> --- a/CREDITS
> +++ b/CREDITS
> @@ -2899,6 +2899,15 @@ S: 6 Karen Drive
>  S: Malvern, Pennsylvania 19355
>  S: USA
>  
> +N: Rick van Rein
> +E: rick@xxxxxxxxxxx
> +W: http://rick.vanrein.org/
> +D: Memory, the BadRAM subsystem dealing with defective RAM modules.
> +S: Haarlebrink 5
> +S: 7544 WP  Enschede
> +S: The Netherlands
> +P: 1024D/89754606  CD46 B5F2 E876 A5EE 9A85  1735 1411 A9C2 8975 4606
> +
>  N: Stefan Reinauer
>  E: stepan@xxxxxxxx
>  W: http://www.freiburg.linux.de/~stepan/
> diff --git a/Documentation/BadRAM.txt b/Documentation/BadRAM.txt
> new file mode 100644
> index 0000000..3fb4994
> --- /dev/null
> +++ b/Documentation/BadRAM.txt
> @@ -0,0 +1,370 @@
> +INFORMATION ON USING BAD RAM MODULES
> +====================================
> +
> +The BadRAM feature enables Linux to run on broken memory.  The
> +resulting system will be stable and healthy, because the kernel
> +simply never allocates the faulty pages for use.  This is how
> +to setup BadRAM if your memory is failing.
> +
> +
> +Introduction
> +------------
> +
> +As RAM memory grows smaller, it also becomes harder to manufacture
> +chips that are perfect.  Each single cell that is failing could cause
> +an entire memory module to fail.  Even though manufacturers put in
> +extra cells to replace failed ones, it is still possible that the
> +sensitive small structures get damaged by an electric discharge on

I would say:                                    electrical
but I can't say why...

> +their pins.  Such damage leads to problems in fixed locations of
> +the address space of a memory module, which is what theory predicts
> +and has been confirmed by years of experience with bad memory.
> +
> +It is not necessary for such a memory module to be discarded.  All
> +pages of memory behave the same, and if only we skip the failing
> +pages we can continue to use the module for many more years.  The
> +operating system kernel simply has to avoid using the blocks that
> +are damaged.  This is easy to do in the part of the kernel where
> +memory pages are allocated.
> +
> +
> +Reasons for using BadRAM
> +------------------------
> +
> +Chip manufacturing processes use lots of harsh chemicals, and the less
> +of these used, the better.  Being able to make good use of partially
> +failed memory chips means that far less of those chemicals are needed
> +to provide storage.  This reduces expenses and it is lighter on the
> +environment in which we live.
> +
> +This kernel feature clearly shows that Linux is "the flexible OS".
> +If something does not work, fix it.  Also, share it with all the
> +others that could use it.  After more than a decennium of BadRAM,

          who                                   or decade

> +the response has been purely positive, because it has helped real
> +people to solve real problems.
> +
> +One important use for this feature is with laptops that have their
> +memory soldered in.  Such laptops would have to be discarded as a
> +whole, but with BadRAM in place they can continue to be used
> +without further restrictions.
> +
> +Finally, running a system on broken memory is just plain cool ;-)
> +
> +
> +Running example
> +---------------
> +
> +To run this project, I was given two DIMMs, 32 MB each. One, that we
> +shall use as a running example in this text, contained 512 faulty bits,
> +spread over 1/4 of the address range in a regular pattern.  This looks
> +a lot like the fauly pattern that many others have reported; the only
> +common other pattern is a single faulty spot.  With such memory, a few
> +tricks with a thorough RAM tester and some binary calculations suffice
> +to write these fault patterns down in 2 longword numbers.  The format
> +of these is hexadecimal, which is a condensed way of writing down the
> +binary patterns that make the hardware patterns recognisable.
> +
> +After being patched and invoked with the properly formatted description,
> +the kernel held back only the memory pages with faults, and never handed
> +them out for allocation. The allocation routines could therefore
> +progress as normally, without any adaption.  This is important, since

                                     adaptation.

> +all the work is done at booting time.  After booting, the kernel does
> +not have to do spend any time to implement BadRAM.
> +
> +As a result of this initial exercise, I gained 30 MB out of the 32 MB
> +DIMM that would otherwise have been thrown away.  Of course, these
> +numbers scale up with larger memory modules, but the principle is
> +the same.
> +
> +
> +The structure of memory failures
> +--------------------------------
> +
> +Memory chips are usually laid out in a roughly equal number of rows
> +and columns, making it a square of cells that each store one bit.
> +When addressing a bit, the processor sends the row and column in
> +separate phases, and then reads or writes its value.  The rows and
> +columns are therefore visible on the outside of a chip.
> +
> +The connections of row and column lines to the outside world is
> +usually protected by a buffer.  It can happen that a static
> +discharge damages such a buffer, causing an entire row or an
> +entire column to fail.  This means that a series of bits become
> +unusable in a single page or in a regular pattern of pages,
> +depending on whether it was a row or column that got damaged.
> +
> +For this reason, BadRAM was designed to describe memory faults
> +in a pattern of address/mask pairs.  An address locates an
> +error and a zero on the corresponding position in the mask
> +defines which bits in the address may be replaced with any
> +other value.  This has shown to work as a tight description
> +of error patterns: it is very compact, but does not waste pages
> +that are good.
> +
> +
> +BadRAM's notation for memory faults
> +-----------------------------------
> +
> +Instead of manually providing all 512 errors in the running example
> +to the kernel, it's easier to use a pattern notation. Since the
> +regularity is based on address decoding software, which generally
> +takes certain bits into account and ignores others, we shall
> +provide a faulty address F, together with a bit mask M that
> +specifies which bits must be equal to F. In C code, an address A
> +is faulty if and only if
> +
> +	(F & M) == (A & M)
> +
> +or alternately (closer to a hardware implementation):
> +
> +	~((F ^ A) & M)
> +
> +In the example 32 MB chip, I had the faulty addresses in 8MB-16MB:
> +
> +	xxx42f4         ....0100....
> +	xxx62f4         ....0110....
> +	xxxc2f4         ....1100....
> +	xxxe2f4         ....1110....
> +
> +The second column represents the alternating hex digit in binary form.
> +Apparently, the first and next to last binary digit can be anything,
> +so the binary mask for that part is 0101. The mask for the part after
> +this is 0xfff, and the part before should select anything in the range
> +8MB-16MB, or 0x00800000-0x01000000; this is done with a bitmask
> +0xff80xxxx. Combining these partial masks, we get:
> +
> +	F=0x008042f4    M=0xff805fff
> +
> +That covers every fault in this DIMM; for more complicated failing
> +DIMMs, or for a combination of multiple failing DIMMs, it can be
> +necessary to set up a number of such F/M pairs.
> +
> +
> +Getting started
> +---------------
> +
> +If you experience RAM trouble, first read Documentation/memory.txt
> +and try out the mem=4M trick to see if at least some initial parts
> +of your RAM work well.  Note that 4 MB will not be able to hold a
> +modern desktop, so if you rely on that you would have to set the
> +limit higher (and accept that your sanity check is not as tight as
> +possible).
> +
> +The BadRAM routines halt the kernel in panic if the reserved area
> +of memory (containing kernel stuff) contains a faulty address.  It
> +will only do that when supplied with the patterns below; this
> +initial check is merely to see if this is likely to happen.
> +
> +
> +Running a memory checker
> +------------------------
> +
> +There is no memory checker built into the kernel, to avoid delays
> +at runtime or while booting. If you experience problems that may
> +be caused by RAM, run a good outside RAM checker.  The Memtest86
> +checker is a popular, free, high-quality checker.  Many Linux
> +distributions include it as an alternate boot option, so you may
> +simply find it in your boot loader's boot menu.
> +
> +
> +The memory checker lists all addresses that have a fault.  It will
> +do this for a given configuration of the DIMMs in your motherboard;
> +if you replace or move memory modules you may find other addresses.
> +In the running example's 32 MB chip, with the DIMM in slot #0 on
> +the motherboard, the errors were found in the 8MB-16MB range:
> +
> +	xxx42f4
> +	xxx62f4
> +	xxxc2f4
> +	xxxe2f4
> +
> +The error reported was a "sticky 1 bit", a memory bit that always
> +reads as "1" even if a "0" was just written to it.  This is
> +probably caused by a damaged buffer on one of the rows or columns
> +in one of the memory chips.
> +
> +It would be a lot of work to collect the individual errors and
> +condense them into a pattern.  That is why I patched the
> +Memtest86 (v2.3+) checker to directly print out the address/mask
> +pairs that are used by this kernel feature. All you would do is
> +select the BadRAM printout option at the start of the scan, and
> +then leave it running for hours and hours, until it has made at
> +least one pass.  The patterns are printed each time a bit is
> +added, but each line contains all faults found up to that point,
> +so you would write down the last set of patterns printed, and
> +supply that as a boot option in your next run of a
> +BadRAM-capable Linux kernel.
> +
> +If you use this patch on an x86_64 architecture, your addresses are
> +twice as long.  Fill up with zeroes in the address and with f's in
> +the mask.  The latter example would thus become:
> +
> +	mem=24M badram=0x0000000000f00000,0xfffffffffff00000
> +
> +The patch applies the changes to both x86 and x86_64 code bases
> +at the same time.  Patching but not compiling maps the entire
> +source tree at once, which makes more sense than splitting the
> +patch into an x86 and x86_64 branch, because those two branches
> +could not be applied at the same time because they would overlap.
> +
> +
> +Rebooting Linux
> +---------------
> +
> +Once the fault patterns are known we simply restart Linux with
> +these F/M pairs as a parameter If your normal boot options look

                        parameter. If

> +like
> +
> +       root=/dev/sda1 ro
> +
> +you should now boot with options
> +
> +       root=/dev/sda1 ro badram=0x008042f4,0xff805fff
> +
> +or perhaps by mentioning more F/M pairs in an order F0,M0,F1,M1,...
> +When you provide an odd number of arguments to badram, the default
> +mask 0xffffffff (meaning that only one address is matched) is
> +applied to the last address.
> +
> +If your bootloader is GRUB, you can supply this additional
> +parameter interactively during boot.  This way, you can try them
> +before you edit /boot/grub/grub.conf to put them in forever.
> +
> +When the kernel now boots, it should not give any trouble with RAM.
> +Mind you, this is under the assumption that the kernel and its data
> +storage do not overlap an erroneous part. If they do, and the
> +kernel does not choke on it right away, BadRAM itself will stop the
> +system with a kernel panic.  When the error is that low in memory,
> +you will need additional bootloader magic, to load the kernel at an
> +alternative address.
> +
> +Now look up your memory status with
> +
> +	cat /proc/meminfo |grep HardwareCorrupted
> +
> +which prints a single line with information like
> +
> +HardwareCorrupted:  2048 kB
> +
> +The entry HardwareCorrupted: 2048k represents the loss of 2MB
> +of general purpose RAM due to the errors. Or, positively rephrased,
> +instead of throwing out 32MB as useless, you only throw out 2MB.
> +Note that 2048 kB equals 512 pages of 4kB.  The size of a page is
> +defined by the processor architecture.
> +
> +If the system is stable (which you can test by compiling a few
> +kernels, and a few file finds in / or so) you can decide to add
> +the boot parameter to /boot/grub/grub.conf, in addition to any
> +other boot parameters that may already be there.  For example,
> +
> +	kernel /boot/vmlinuz root=/dev/sda1 ro
> +
> +would become
> +
> +	kernel /boot/vmlinuz root=/dev/sda1 ro badram=0x008042f4,0xff805fff
> +
> +Depending on how helpful your Linux distribution is, you may
> +have to add this feature again after upgrading your kernel.  If
> +your boot loader is GRUB, you can always do this manually if you
> +rebooted before you remembered to make that adaption.

                                               adaptation.

> +
> +
> +BadRAM classification
> +---------------------
> +
> +This technique might start a lively market for "dead" RAM. It is
> +important to realise that some RAMs are more dead than others. So,
> +instead of just providing a RAM size, it is also important to know
> +the BadRAM class, which is defined as follows:
> +
> +	A BadRAM class N means that at most 2^N bytes have a problem,
> +	and that all problems with the RAMs are persistent: They
> +	are predictable and always show up.
> +
> +The DIMM that serves as an example here was of class 9, since 512=2^9
> +errors were found. Higher classes are worse, "correct" RAM is of class
> +-1 (or even less, at your choice).
> +Class N also means that the bitmask for your chip (if there's just one,
> +that is) counts N bits "0" and it means that (if no faults fall in the
> +same page) an amount of 2^N*PAGESIZE memory is lost, in the example on
> +an x86 architecture that would be 2^9*4k=2MB, which accounts for the
> +initial claim of 30MB RAM gained with this DIMM.
> +
> +Note that this scheme has deliberately been defined to be independent
> +of memory technology and of computer architecture.
> +
> +
> +Further Possibilities
> +---------------------
> +
> +**Slab allocation support**
> +
> +It would be possible to use even more of the faulty RAMs by employing
> +them for slabs. The smaller allocation granularity of slabs makes it
> +possible to throw out just, say, 32 bytes surrounding an error. This
> +would mean that the example DIMM only caused a loss of 16kB instead
> +of 2MB, or scaled-up similar values for larger memory sizes.  One
> +specific area that could benefit from this is the growing market
> +for embedded devices, which usually wants to meet tight budgets.
> +
> +It should be possible to make the slab allocator prefer pages with
> +broken memory, and allocate the faulty places in memory before the
> +other slabs are made available to the kernel.  In the best possible
> +situation, this could reduce the loss of good RAM cells to zero!
> +
> +**Support for low-memory errors**
> +
> +To the best of my knowledge, boot loaders like GRUB cannot load
> +the Linux kernel in non-standard locations.  This means that any
> +errors at low memory locations cannot be overcome with BadRAM.
> +
> +Anything that physically alters the memory layout can be used
> +to overcome such problems; this may be achieved through BIOS
> +settings, or by adding or swapping memory modules.
> +
> +A general solution could be to use a boot loader that can load
> +the Linux kernel (and its initial memory allocation) at other
> +memory addresses than are standard.
> +
> +
> +**Boot-time memory checking**
> +
> +Many suggestions have been made to insert a RAM checker at boot time;
> +since this would leave the time to do only very meager checking, it
> +is not a reasonable option; we already have a half-done BIOS check
> +doing that!
> +
> +**ECC RAM integration**
> +
> +It would be interesting to integrate this functionality with the
> +self-verifying nature of ECC RAM. These memories can even distinguish
> +between recoverable and unrecoverable errors! Such memory has been
> +handled in older operating systems by `testing' once-failed memory
> +blocks for a while, by placing only (reloadable) program code in it.
> +
> +I possess no faulty ECC modules to work this out, and there is no
> +general use for it either.
> +
> +
> +Names and Places
> +----------------
> +
> +The home page of this project is on
> +	http://rick.vanrein.org/linux/badram
> +This page also links to Nico Schmoigl's experimental extensions to
> +this patch (with debugging and a few other fancy things).
> +
> +In case you have experiences with the BadRAM software which differ from
> +the test reportings on that site, I hope you will mail me with that
> +new information.
> +
> +The BadRAM project is an idea and implementation by
> +	Rick van Rein
> +	Haarlebrink 5
> +	7544 WP  Enschede
> +	The Netherlands
> +	rick@xxxxxxxxxxx
> +If you like it, a postcard would be much appreciated ;-)
> +
> +
> +							Enjoy,
> +							 -Rick.
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index cc85a92..ba3e984 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -51,6 +51,7 @@ parameter is applicable:
>  	FB	The frame buffer device is enabled.
>  	GCOV	GCOV profiling is enabled.
>  	HW	Appropriate hardware is enabled.
> +	HWPOISON Handling of memory pages reported as being corrupt

These entries are normally used as in my example below.  I'm not sure that
it makes sense here.

>  	IA-64	IA-64 architecture is enabled.
>  	IMA     Integrity measurement architecture is enabled.
>  	IOSCHED	More than one I/O scheduler is enabled.
> @@ -373,6 +374,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>  
>  	autotest	[IA64]
>  
> +	badram=		When CONFIG_MEMORY_FAILURE is set, this parameter

        badram=		[HWPOISON] When CONFIG_MEMORY_FAILURE is set, this parameter

> +			allows memory areas to be flagged as HWPOISON.
> +			Format: <addr>,<mask>[,...]
> +			See Documentation/BadRAM.txt
> +
>  	baycom_epp=	[HW,AX25]
>  			Format: <io>,<mode>
>  
> -- 

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>