On Thu, 2008-03-20 at 20:15 +0100, Raoul Bhatia [IPAX] wrote: > James Bottomley wrote: > > This is all normal. Seagate drives are known for throwing protocol > > errors under stress at certain revs of firmware. That's what > > REQ_TASK_ABORT, reason=0x6 is. > > > > Your logs indicate that the recovery occurred correctly (as in all tasks > > were eventually retried), so it doesn't show an actual problem. > > ok, i already filed a trouble ticket at seagate - lets see if they > provide a firmware update for the disks. afaik mine is "firmware 0002" > > >> sometimes even a disk is kicked out of the raid configuration. > > > > This would be abnormal, if you have a log of this, could you post it. I > > assume it was because of I/O errors? > > i attached a bigger syslog file (.gz format). OK, this looks more definitive, thanks! What appears to be happening is that you get a run of protocol errors, not necessarily all on the same command, but what happens every time (by current design of the aic94xx driver) is that we halt the aic94xx, abort all the outstanding commands and resubmit them. Because the disk is being hammered, there are rather a lot, so all it takes is five protocol errors in a few seconds for one unlucky command to get aborted five times (not necessarily through any fault of its own) and run out of retries. This causes it to return to the upper layers with DID_ABORT and be treated as an I/O error. A work around might be to lower the queue depth to say 4 or 8 and up the retries (this latter can only be done by altering the SD_MAX_RETRIES parameter in include/scsi/sd.h and recompiling). Longer term, I think REQ_TASK_ABORT needs to be handled better on the fly. What we should do is abort only the task we've been asked to abort and return it to the upper layer for a retry without invoking the error handler ... I can look into this, but it will take a while. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html