Re: Software RAID when it works and when it doesn't

Support <support@xxxxxxxxx> · Wed, 17 Oct 2007 16:53:52 -0500

On Tue, 2007-10-16 at 17:57 -0400, Mike Accetta wrote:

> Was the disk driver generating any low level errors or otherwise
> indicating that it might be retrying operations on the bad drive at
> the time (i.e. console diagnostics)?  As Neil mentioned later, the md layer
> is at the mercy of the low level disk driver.  We've observed abysmal
> RAID1 recovery times on failing SATA disks because all the time is
> being spent in the driver retrying operations which will never succeed.
> Also, read errors don't tend to fail the array so when the bad disk is
> again accessed for some subsequent read the whole hopeless retry process
> begins anew.

The console was full of errors like:

end_request: I/O error, dev sdb, sector 42644555

I don't know what generates those messages.

As I asked before but never got an answer, is there a way to do timeouts
within the md code so that we are not at the mercy of the lower layer
drivers?

> 
> I posted a patch about 6 weeks ago which attempts to improve this situation
> for RAID1 by telling the driver not to retry on failures and giving some
> weight to read errors for failing the array.  Hopefully, Neil is still
> mulling it over and it or something similar will eventually make it into
> the main line kernel as a solution for this problem.
> --
> Mike Accetta
> 

Thanks,

Alberto
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html