--- On Thu, 3/20/08, James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > On Thu, 2008-03-20 at 20:15 +0100, Raoul Bhatia [IPAX] > wrote: > > James Bottomley wrote: > > > This is all normal. Seagate drives are known for > throwing protocol > > > errors under stress at certain revs of firmware. > That's what > > > REQ_TASK_ABORT, reason=0x6 is. > > > > > > Your logs indicate that the recovery occurred > correctly (as in all tasks > > > were eventually retried), so it doesn't show > an actual problem. > > > > ok, i already filed a trouble ticket at seagate - lets > see if they > > provide a firmware update for the disks. afaik mine is > "firmware 0002" > > > > >> sometimes even a disk is kicked out of the > raid configuration. > > > > > > This would be abnormal, if you have a log of > this, could you post it. I > > > assume it was because of I/O errors? > > > > i attached a bigger syslog file (.gz format). > > OK, this looks more definitive, thanks! > > What appears to be happening is that you get a run of > protocol errors, > not necessarily all on the same command, but what happens > every time (by > current design of the aic94xx driver) is that we halt the > aic94xx, abort > all the outstanding commands and resubmit them. Because > the disk is > being hammered, there are rather a lot, so all it takes is > five protocol > errors in a few seconds for one unlucky command to get > aborted five > times (not necessarily through any fault of its own) and > run out of > retries. This causes it to return to the upper layers with > DID_ABORT > and be treated as an I/O error. > > A work around might be to lower the queue depth to say 4 or > 8 and up the > retries (this latter can only be done by altering the > SD_MAX_RETRIES > parameter in include/scsi/sd.h and recompiling). > > Longer term, I think REQ_TASK_ABORT needs to be handled > better on the > fly. What we should do is abort only the task we've > been asked to abort > and return it to the upper layer for a retry without > invoking the error > handler ... I can look into this, but it will take a while. The original driver, from which you forked off, has always supported this correct (SCSI) behaviour. Luben -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html