Re: md failing mechanism

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 24/01/2016 06:02, James J wrote:
On 23/01/2016 15:09, Wols Lists wrote:
On 22/01/16 23:40, James J wrote:
The recommentation of raising the timeout to 120+ is for the opposite
purpose of what you want. It is for the case the sysadmin accepts to
wait a long time because he wants to prevent the kicking of the drive at
the first read-error (normally drives are kicked for a write error).
This might be wanted in order to a) defer the replacement of the drive,
either to perform the replacement at a more opportune time and/or in a
better manner such as a no-degrade replace operation, or b) because he
does not want to replace the drive at all: maybe he believes that the
error might be spurious and will not happen again and the drive is still
of acceptable fitness for the purpose, e.g. in a low-cost file server.
Except, aiui, even in your scenario! drives are kicked for a *write* error.

What happens (should be) is the kernel times out, the raid handles the
read error by trying a rewrite, the drive is still hung on the read
error so it doesn't respond to the write request, and the drive gets
kicked for a write failure.

Oh yes you are correct, so the drive would be kicked after 60secs and not after 30secs contrary to what I said. So the sequence would be: drive stuck on read --> scsi read failure due to timeout at the 30th second --> MD receives failure and attempts rewrite --> scsi write failure due to timeout at the 60th second --> drive kicked by MD at the 60th second I think this is what should have happened, but it didn't happen like this anyway so I think there is probably a kernel bug somewhere.
I don't have a lot to add, except that I recall the OP suggested it was an IDE drive. I wonder if the IDE sub-system and/or hardware operates differently compared to the sata variants. Possibly the MD layer never got any timeout or error on the read, and (or maybe it was the write) and hence it was never kicked from the array.

Regards,
Adam
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux