Re: problem killing raid 5

Daniel Santos <daniel.dlds@xxxxxxxxx> · Tue, 02 Oct 2007 07:53:33 +0100

All the drives are identical, and they are on identical usb enclosures. 
I am starting to suspect USB. It frequently resets the enclosures. I'll 
have to look at that first. Anyway I had it working before for some time.

Justin Piszcz wrote:

On Mon, 1 Oct 2007, Daniel Santos wrote:

It stopped the reconstruction process and the output of /proc/mdstat 
was :

oraculo:/home/dlsa# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid1] [raid0] [linear]
md0 : active raid5 sdc1[3](S) sdb1[4](F) sdd1[0]
    781417472 blocks level 5, 256k chunk, algorithm 2 [3/1] [U__]

I then stopped the array and tried to assemble it with a scan :

oraculo:/home/dlsa# mdadm --assemble --scan
mdadm: /dev/md0 assembled from 1 drive and 1 spare - not enough to 
start the array.
oraculo:/home/dlsa# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid1] [raid0] [linear]
md0 : inactive sdd1[0](S) sdc1[3](S) sdb1[1](S)
    1172126208 blocks

The fourth drive I had to put in mdadm.conf as missing.

The result was that because of the read error, the reconstruction 
process for the new array aborted, and the assemble came up with an 
array that seems like the one that failed before I created the new one.

I am running debian with a 2.6.22 kernel.

Michael Tokarev wrote:
Patrik Jonsson wrote:

Michael Tokarev wrote:

[]

But in any case, md should not stall - be it during reconstruction
or not.  For this, I can't comment - to me it smells like a bug
somewhere (md layer? error handling in driver? something else?)
which should be found and fixed.  And for this, some more details
are needed I guess -- kernel version is a start.

Really? It's my understanding that if md finds an unreadable block
during raid5 reconstruction, it has no option but to fail since the
information can't be reconstructed. When this happened to me, I had to

Yes indeed, it should fail, but not stuck as Daniel reported.
Ie, it should either complete the work or fail, but not sleep
somewhere in between.

[]

This is why it's important to run a weekly check so md can repair 
blocks
*before* a drive fails.

*nod*.

/mjt

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Yikes.  By the way are all those drives on the same chipset? What type 
of drives did you use?

Justin.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html