Robert Hancock wrote:
I'm assuming that the kernel will retry these requests after the sata
link has been reset?
Yes.
The errors appear to be randomly distributed over the four drives on
this machine - all are Seagate ST31000340NS with either firmware version
SN05 or SN16...
This kind of problem often seems to be due to signal integrity or
power problems. For whatever reason, an insufficient power supply (or
something like overloading one power cable) can tend to trigger SATA
errors as an early symptom..
Thanks for the reply Robert... The power (and SATA signal) delivery to
these drives is via a hot-swap backplane which is built-into the chassis
- I had considered some sort of hardware fault here, and that would seem
possible, but I don't really have any way to check as I don't have
access to another one of these machines in order to swap-out parts etc.
IPMI info looks OK (although I realise this may not catch transient
power problems at the drives etc.).
The timeouts appear to happen about 4 times per month. In the absence
of any other easy strategies, I've disabled SMART data collection on
this machine, on the off-chance that that makes any difference....
Cheers,
Tim.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html