Re: [PATCH] sd: retry read_capacity on UNIT_ATTENTION

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2010-04-08 at 09:36 +0200, Hannes Reinecke wrote:
> James Bottomley wrote:
> > On Thu, 2010-04-01 at 15:44 +0200, Hannes Reinecke wrote:
> >> Hazard testing uncovered yet another bug in sd. Under heavy
> >> reset activity the retry counter might be exhausted and
> >> the command will be returned with sense UNIT_ATTENTION/0x29/00
> >> (POWER ON, RESET, OR BUS DEVICE RESET OCCURRED). In those
> >> cases we should just increase the retry counter again,
> >> retrying one more to clear up this Unit Attention state.
> >>
> >> Signed-off-by: Hannes Reinecke <hare@xxxxxxx>
> >>
> >> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> >> index 1962bea..7d75a21 100644
> >> --- a/drivers/scsi/sd.c
> >> +++ b/drivers/scsi/sd.c
> >> @@ -1454,8 +1454,15 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
> >>  		if (media_not_present(sdkp, &sshdr))
> >>  			return -ENODEV;
> >>  
> >> -		if (the_result)
> >> +		if (the_result) {
> >>  			sense_valid = scsi_sense_valid(&sshdr);
> >> +			if (sense_valid &&
> >> +			    sshdr.sense_key == UNIT_ATTENTION &&
> >> +			    sshdr.asc = 0x29 && sshdr.asq == 0x00)
> >                                       ^^^^
> > should be ==
> > 
> >> +			    /* Device reset might occur several times,
> >> +			     * give it one more chance */
> >> +			    retries++;
> >> +		}
> > 
> > Firstly, not even compile checked:
> > 
> > drivers/scsi/sd.c: In function ‘read_capacity_10’:
> > drivers/scsi/sd.c:1558: error: ‘struct scsi_sense_hdr’ has no member named ‘asq’
> > 
> D'oh.
> 
> > Secondly, we can't quite do this.  Some devices (only broken ones in my
> > experience) will reply UNIT_ATTENTION I was RESET forever, leading to a
> > loop here.  Additionally, a massive reset storm on a shared bus would
> > DoS the code here, so there must be a give up point after a reasonable
> > number of retries.
> > 
> Hmm. yes.
> 
> > The third problem is that if this is happening to a large device, we
> > only catch it in RC10 ... so we'll report undersize if the device is >
> > SPC2
> > 
> Okay. In the best of all worlds we would have a module parameter which
> would us to adjust this parameter, as I fear the actual number of retries
> will depend on the number of devices connected.
> 
> But if you fell that's overkill it's fine by me, too.

A module parameter probably is overkill.  Once we get beyond 5, that's
the usual retry limit for ordinary commands, so even if we survive
beyond READ_CAPACITY, we'll begin failing in the actual operational
commands like reads and writes (beginning with partition size).

> > How about this instead?
> > 
> Yes, that's better. Thanks.

OK, will put it in with your ack then.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux