On 05/24/2010 03:34 AM, Tim Small wrote:
On 21/05/10 21:57, Doug Ledford wrote:
On 05/21/2010 12:40 PM, MRK wrote:
On 05/21/2010 04:16 AM, Doug Ledford wrote:
Could the cabling to the drive be causing this? (maybe failing or maybe
it's partly disconnected)
I don't remember at what point Linux is at implementing the checksums
between the controller and the drive.
I don't know. I'm not up on the SATA signaling details so I don't know
if it uses CRC on the signal, but I suspect it does and a bad cable
would cause failed requests. But I wouldn't bet my house on it, so I
would ask some SATA gurus.
I wouldn't call myself that, but I believe PATA and SATA-level CRC
errors show up in the UDMA_CRC_Error_Count SMART variable - look for a
non-zero raw value in the smartctl output. This is presumably just the
error-count from the drive's point of view (bad data recd at drive end).
I don't know what happens with CRC errors detected at the Linux end -
and whether detection is controller-dependant. Better ask on linux-ide.
From the SMART attribute name, presumably the earlier PATA transfer
modes don't support CRC error detection.
An easy thing to check might be to reduce the libata transfer speed from
3GBps to 1.5GBps. Similarly, try to test each drive and SATA port in
isolation if you can....
ATA transfer errors should cause a bad CRC resulting in a failed
transfer which will cause complaints in the kernel log. For PATA, only
UDMA modes can detect CRC errors, PIO and MWDMA transfers can't.
There are other places where data corruption can occur however, like
inside the controller or the drive itself..
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html