On Tue, 2007-10-16 at 17:57 -0400, Mike Accetta wrote: > Was the disk driver generating any low level errors or otherwise > indicating that it might be retrying operations on the bad drive at > the time (i.e. console diagnostics)? As Neil mentioned later, the md layer > is at the mercy of the low level disk driver. We've observed abysmal > RAID1 recovery times on failing SATA disks because all the time is > being spent in the driver retrying operations which will never succeed. > Also, read errors don't tend to fail the array so when the bad disk is > again accessed for some subsequent read the whole hopeless retry process > begins anew. The console was full of errors like: end_request: I/O error, dev sdb, sector 42644555 I don't know what generates those messages. As I asked before but never got an answer, is there a way to do timeouts within the md code so that we are not at the mercy of the lower layer drivers? > > I posted a patch about 6 weeks ago which attempts to improve this situation > for RAID1 by telling the driver not to retry on failures and giving some > weight to read errors for failing the array. Hopefully, Neil is still > mulling it over and it or something similar will eventually make it into > the main line kernel as a solution for this problem. > -- > Mike Accetta > Thanks, Alberto - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html