Re: sd 6:0:0:0: [sdb] Unaligned partial completion

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Mon, 11 Jun 2018 14:51:20 -0700

[readd linux-scsi]
On Mon, 2018-06-11 at 14:43 -0700, Ted Cabeen wrote:
> On 06/11/2018 02:40 PM, James Bottomley wrote:
> > > I have also seen Aborted Command sense when doing heavy testing
> > > on one or more SAS disks behind a SAS expander. I put it down to
> > > a temporary lack of paths available (on the link between the
> > > host's HBA and the expander) when one of those SAS disks tries to
> > > get a connection back to the host with the data (data-in
> > > transfer) from an earlier READ command.
> > > 
> > > In my code (ddpt and sg_dd) I treat it as a "retry" type error
> > > and in my experience that works. IOW a follow-up READ with the
> > > same parameters is successful.
> > 
> > We do treat ABORTED_COMMAND as a retry.  However, it will tick down
> > the retry count (usually 3) and then fail if it still occurs.  How
> > long does this condition persist for? because if it's long lived we
> > could treat it as ADD_TO_MLQUEUE which would mean we'd retry until
> > the timeout condition was reached.
> 
> When you retry, should that result in additional kernel messages, or 
> does the kernel message only appear after the 3 retrys have all
> failed?

The latter: without enabling logging, we don't print anything for
successfully retried commands.

James