Re: Reduce Timeout on Disk Failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



jim@rubylane.com wrote:
> 
> If this is patched, I hope it is also put into a 2.2 update.  When a
> SW raid is running, a couple of I/O retries might be reasonable, but
> not heroic recovery attempts that would make good sense in a
> single-disk environment.

Yes, the md driver in 2.2 had a ridiculously large retry loop when an
I/O failure occurs...if I counted correctly, I think it did 4096 retries
on I/O failure! This usually means that one of the lower level drivers
ends up hung in a pretty tight error handling loop...

 
> We did a simple test of powering down an IDE drive that was part of an
> (idle) SW raid, then trying to access the filesystem, and the system
> just locked up.  Maybe it would have eventually come back to life - I
> dunno.

Yep, we tried similar things with a network block device (breaking the
network connection)...we ended up hacking the raid1 and nbd drivers and
inserting schedule() calls just to mitigate the effects of the retries a
little bit...we at least got the system not to hang completely while the
retries were going on... 

 
> For the curious, we haven't upgraded to 2.4x because whenever I check
> the kernel traffic page, it seems there are still important bugs being
> found and corrected - ones we don't want to experience in a production
> setup.

Well, this particular retry problem does not exist in 2.4. And in
general, as far as software RAID is concerned, 2.4 is a lot better...I
know, at least with raid1, you can fail a device just about anytime you
want (with lots of write activity, during a resync, etc.) and as often
as you want, and it doesn't hang...

--
Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux