Re: Software RAID when it works and when it doesn't

Goswin von Brederlow <brederlo@xxxxxxxxxxxxxxxxxxxxxxxxxxx> · Fri, 26 Oct 2007 18:12:37 +0200

Bill Davidsen <davidsen@xxxxxxx> writes:

> Alberto Alonso wrote:
>> On Tue, 2007-10-23 at 18:45 -0400, Bill Davidsen wrote:
>>
>>
>>> I'm not sure the timeouts are the problem, even if md did its own
>>> timeout, it then needs a way to tell the driver (or device) to stop
>>> retrying. I don't believe that's available, certainly not
>>> everywhere, and anything other than everywhere would turn the md
>>> code into a nest of exceptions.
>>>
>>>
>>
>> If we loose the ability to communication to that drive I don't see it
>> as a problem (that's the whole point, we kick it out of the array). So,
>> if we can't tell the driver about the failure we are still OK, md could
>> successfully deal with misbehaved drivers.
>
> I think what you really want is to notice how long the drive and
> driver took to recover or fail, and take action based on that. In
> general "kick the drive" is not optimal for a few bad spots, even if
> the drive recovery sucks.

Depending on the hardware you can still access a different disk while
another one is reseting. But since there is no timeout in md it won't
try to use any other disk while one is stuck.

That is exactly what I miss.

MfG
        Goswin
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html