Re: ext3 journal on software raid (was Re: PROBLEM: Kernel 2.6.10 crashing repeatedly and hard)

Michael Tokarev <mjt@xxxxxxxxxx> · Wed, 05 Jan 2005 17:37:34 +0300

Erik Mouw wrote:
On Wed, Jan 05, 2005 at 02:56:30PM +0400, Brad Campbell wrote:

I beg to differ on this one. Having spend several weeks tracking down 
random processes dying on a machine that turned out to be a bad sector in 
the swap partition, I have had great results by running swap on a RAID-1. 
If you develop a bad sector in a non-mirrored swap, bad things happen 
indeterminately and can be a royal PITA to chase down. It's just a little 
extra piece of mind.

If you have a  bad block in your swap partition and the device doesn't
report an error about it, no amount of RAID is going to help you
against it.

The drive IS reporting read errors in most cases.  But that does not
help, really: kernel swapped out some memory but can't read it back,
so things are screwed.  Just like if you hot-remove a DIMM while the
system is running: the kernel loses parts of it's memory, and it can't
work anymore.  Depending on what was in there ofcourse: the whole
system may be screwed, or a single process...  The talks isn't about
"undetectable" (unreported etc) errors here, but about the fact that
the error is here.  And if your swap is on raid, in case one component
of the array behaves badly, another component will continue to work,
so with swap on raid the system will work just fine as if nothing
happened in case one of "swap components" (i mean underlying devices)
failed for whatever reason.

And please, pretty PLEASE stop talking about those mysterious
"undetectable" or "unreported" errors here.  A drive that develops
"unreported" errors just does not work and should not be here in
the first place, just like bad memory or CPU: if your cpu or memory
is failing, no software tricks helps and the failing part should
be replaced BEFORE even thinking about possible ways to recover.

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html