Re: sd 6:0:0:0: [sdb] Unaligned partial completion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2018-06-11 at 14:59 -0700, Ted Cabeen wrote:
> On 06/11/2018 02:40 PM, James Bottomley wrote:
> > On Mon, 2018-06-11 at 12:20 -0400, Douglas Gilbert wrote:
> > > I have also seen Aborted Command sense when doing heavy testing
> > > on one or more SAS disks behind a SAS expander. I put it down to
> > > a temporary lack of paths available (on the link between the
> > > host's HBA and the expander) when one of those SAS disks tries to
> > > get a connection back to the host with the data (data-in
> > > transfer) from an earlier READ command.
> > > 
> > > In my code (ddpt and sg_dd) I treat it as a "retry" type error
> > > and in my experience that works. IOW a follow-up READ with the
> > > same parameters is successful.
> > 
> > We do treat ABORTED_COMMAND as a retry.  However, it will tick down
> > the retry count (usually 3) and then fail if it still occurs.  How
> > long does this condition persist for? because if it's long lived we
> > could treat it as ADD_TO_MLQUEUE which would mean we'd retry until
> > the timeout condition was reached.
> 
> On my system, it's a bit hard to tell, as as soon as ZFS sees the
> read error, it starts resilvering to repair the sector that reported
> the I/O error.  Without the scrub, it happened once over a 5-day
> window.  During the scrub, it was usually 10s of minutes between
> occurrences that failed all the retries, but I had some occasions
> where it happened about 5-10 minutes apart.  It definitely seems to
> be load-related, so how long and hard the load stays elevated is a
> factor.

OK, try this: it will print a rate limited warning if it triggers
(showing it is this problem) and return ADD_TO_MLQUEUE for all the SAS
errors (we'll likely narrow this if it works, but for now let's do the
lot).

James

---

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 8932ae81a15a..94aa5cb94064 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -531,6 +531,11 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
 		if (sshdr.asc == 0xc1 && sshdr.ascq == 0x01 &&
 		    sdev->sdev_bflags & BLIST_RETRY_ASC_C1)
 			return ADD_TO_MLQUEUE;
+		if (sshdr.asc == 0x4b) {
+			printk_ratelimited(KERN_WARNING "SAS/SATA link retry\n");
+			return ADD_TO_MLQUEUE;
+		}
+
 
 		return NEEDS_RETRY;
 	case NOT_READY:



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux