Chris Webb wrote:
Chris Webb <chris@xxxxxxxxxxxx> writes:
Mark Lord <liml@xxxxxx> writes:
Speaking of which..
Chris: I wonder if the errors will also vanish in your situation
by disabling the onboard write-caches in the drives ?
Eg. hdparm -W0 /dev/sd?
Hi Mark. I've got a test machine on its way at the moment, so I'll make sure
I check this one out on it too.
Our test machine is still being built, but we had an opportunity to try this on
a couple of the live machines when their RAID arrays failed over the weekend.
We still got timeouts, but (predictably!) they're not on flushes any more:
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata2.00: cmd 35/00:08:98:c6:00/00:00:4e:00:00/e0 tag 0 dm
...
all the way through the night.
I also have these in the log, but they are immediately after turning off the
write caching in all drives, so may be a red herring with data still being
written out.
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata2.00: cmd c8/00:08:00:20:80/00:00:00:00:00/e0 tag 0 dm
...
On another machine, I saw this with write caching turned off:
ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
ata2.00: cmd 61/08:00:28:1f:80/00:00:00:00:00/40 tag 0 ncq 4096 out
...
0x35 is a 48-bit DMA WRITE, 0xc8 is a 28-bit DMA READ,
and 0x61 is an NCQ WRITE.
Looks like some kind of hardware trouble to me.
And as Tejun suggested, it's difficult to guess at
a cause other than the PSU.
Cheers, and good luck.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html