Re: mdadm expanded 8 disk raid 6 fails in new server, 5 original devices show no md superblock

Phil Turmel <philip@xxxxxxxxxx> · Tue, 14 Jan 2014 13:43:05 -0500

On 01/14/2014 12:47 PM, Wilson Jonathan wrote:

[trim /]

> I understand the issue of "timeout" on drives that might perform long
> error checking which then causes mdadm, via the device (block?) driver
> issuing a time out, to then kick the drive. In this instance you allow
> some time for a drive to try and fix things at the expense of a hung
> array for a longer period of time.
> 
> I also understand that with scterc the drive gives up (in effect timing
> its self out) when it hits the 7 second, or there about, mark and
> subsequently mdadm kicks the drive out. In this specific instance the
> idea is to kill a drive quickly to that the raid doesn't hang longer
> than a few seconds.

No.  The intent is to fail the read without failing the controller channel.

> However surely these things (bar the amount of time) result in the same
> final result of a drive being kicked out. Even in a non-madam hardware
> raid set up, the drive is either kicked because it didn't return in 7
> seconds, or the drive kicks its self because it gave up before 7
> seconds.

No.  Upon a failed read, MD will obtain/reconstruct the problem sector
from remaining redundancy, then write the correct data back.  Occasional
read errors of this type are normal, and fix themselves when the sector
is written again.  MD will only fail a drive after *multiple* read
errors, not just one.  (Isolated bursts of up to 20, then ~ ten per hour.)

[trim /]

> Surely, unless I'm missing something, rebuilding a failed drive's data
> means that you want the system to not kick if at all possible and having
> scterc enabled or a short timeout (shorter than the drives max time,
> unless that time is indefinite retry) is the last thing you want?

What you are missing is what happens when the controller channel times
out.  The original read is reported failed to MD while the driver tries
to revive the unresponsive drive.  MD proceeds to obtain/reconstruct the
missing data, then write back.  But the device is not communicating--the
driver has reset the channel, and will continue not communicating until
the drive firmware finally gives up on the original read.  So the
*write* fails instantly, kicking the drive out of the array.

When you, the admin, get around to looking, the drive is idle but
apparently fine.  (It gains a "pending" sector, which stays until the
drive is told to write over that spot.)

HTH,

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html