Hello,
Mark Lord wrote:
Mmm.. the one Tomas Lund has is on what appears to be AHCI (ICH9R).
Yes, indeed:
# lspci -v -nn -s 00:1f.2
00:1f.2 SATA controller [0106]: Intel Corporation 82801IR/IO/IH
(ICH9R/DO/DH) 6 port SATA AHCI Controller [8086:2922] (rev 02)
(prog-if 01 [AHCI 1.0])
Subsystem: Super Micro Computer Inc Unknown device [15d9:d180]
..
(dmesg output available at http://tlund.pp.se/envy4_dmesg.txt)
Right.
Tomas, if you move the "problem drive" to another port, does the
error follow the drive, or stay with the same port?
..
Yes, the drive has already been moved, and the problem did indeed move
with the drive. However, I am currently stressing the drives by
copying large amounts of files, in paralel sessions, and issuing
"sync" every 30 seconds, and I have not seen the error since Mar 28
18:17 (current time here is Apr 1 09:10).
The system does not have any data on it yet, and I am not really in a
hurry to get it into production. Willing to try anything to track this
problem down.
..
Mmm.. I suppose the thing to do, is to move it back to the port it was
on when it failed, if you haven't already done that.
If things continue to go well there now, it would not be unreasonable
to perhaps assume that a flaky cable connection was the culprit,
and that simply rearranging the drives/cables produced a better connection.
I've never really seen transmission errors resulting in FLUSH failure.
All that needs to be transferred is the opcode and according to the
SMART log, the drive got the command right but still aborted it. The
only thing I can think of right now is that dd over the whole disk put
sanity into the disk but it's just a wild speculation. Yeap, I would
love to see whether the problem reproduces itself when moved back to the
original port.
Thanks.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html