On Sun, 2007-02-04 at 01:21 -0800, Darrick J. Wong wrote: > James Bottomley wrote: > > > There's a problem somewhere with your error handler changes (which I > > picked up thanks to the problems with the V28 firmware). What I see > > without your changes is that for a directly attached SATA device, when > > the firmware begins its death spiral, the commands all return and > > eventually send I/O errors to the filesystem, With your patch series > > applied, it just loops forever giving messages like: > > > > Feb 3 12:07:06 localhost kernel: aic94xx: escb_tasklet_complete: phy5: LINK_RESET_ERROR > > Feb 3 12:07:06 localhost kernel: aic94xx: phy5: Receive FIS timeout > > Feb 3 12:07:06 localhost kernel: aic94xx: phy5: retries:0 performing link reset seq > > Feb 3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host > > Feb 3 12:07:06 localhost kernel: aic94xx: control_phy_tasklet_complete: phy5, lrate:0x8, proto:0xe > > Feb 3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host > > Feb 3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host > > Feb 3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host > > Feb 3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host > > Feb 3 12:07:06 localhost kernel: sas: Enter sas_scsi_recover_host > > Feb 3 12:07:06 localhost kernel: sas: --- Exit sas_scsi_recover_host > > Interesting, since the opposite happens with SAS disks. :) Well, the initial error is a firmware induced drive error of some type. > The infinite loop is usually what happens if a scsi_cmnd gets pulled off > the eh queue without being scsi_eh_finish_cmnd()'d. Can you send me the > whole dmesg? It's possible that we're trying to abort a command, which > of course fails for a SATA disk, so we try bigger and bigger hammers.... > and the big hammers don't call scsi-eh-finish-cmd. I've put the full log from detection of the aic94xx to forced power off (all 512k of it) at http://www2.kernel.org:/pub/linux/kernel/people/jejb/klog.aic94xx.failure.txt (give it a while for the kernel.org mirrors to propagate) > Did these SATA link reset errors only start showing up after the v28 > firmware patch, or has this always happened? I've noticed lately that I > get link reset errors if I run a short exercise on an ext3 filesystem on > a SATA disk, yet dd exercise runs just fine. But I had also thought > that it was just my flaky hardware. :) Er ... no idea ... The problem only shows up with V28 firmware, so I've never seen a SATA disc fail with the V17 firmware. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html