On Thu, 2010-04-08 at 09:36 +0200, Hannes Reinecke wrote: > James Bottomley wrote: > > On Thu, 2010-04-01 at 15:44 +0200, Hannes Reinecke wrote: > >> Hazard testing uncovered yet another bug in sd. Under heavy > >> reset activity the retry counter might be exhausted and > >> the command will be returned with sense UNIT_ATTENTION/0x29/00 > >> (POWER ON, RESET, OR BUS DEVICE RESET OCCURRED). In those > >> cases we should just increase the retry counter again, > >> retrying one more to clear up this Unit Attention state. > >> > >> Signed-off-by: Hannes Reinecke <hare@xxxxxxx> > >> > >> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > >> index 1962bea..7d75a21 100644 > >> --- a/drivers/scsi/sd.c > >> +++ b/drivers/scsi/sd.c > >> @@ -1454,8 +1454,15 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp, > >> if (media_not_present(sdkp, &sshdr)) > >> return -ENODEV; > >> > >> - if (the_result) > >> + if (the_result) { > >> sense_valid = scsi_sense_valid(&sshdr); > >> + if (sense_valid && > >> + sshdr.sense_key == UNIT_ATTENTION && > >> + sshdr.asc = 0x29 && sshdr.asq == 0x00) > > ^^^^ > > should be == > > > >> + /* Device reset might occur several times, > >> + * give it one more chance */ > >> + retries++; > >> + } > > > > Firstly, not even compile checked: > > > > drivers/scsi/sd.c: In function ‘read_capacity_10’: > > drivers/scsi/sd.c:1558: error: ‘struct scsi_sense_hdr’ has no member named ‘asq’ > > > D'oh. > > > Secondly, we can't quite do this. Some devices (only broken ones in my > > experience) will reply UNIT_ATTENTION I was RESET forever, leading to a > > loop here. Additionally, a massive reset storm on a shared bus would > > DoS the code here, so there must be a give up point after a reasonable > > number of retries. > > > Hmm. yes. > > > The third problem is that if this is happening to a large device, we > > only catch it in RC10 ... so we'll report undersize if the device is > > > SPC2 > > > Okay. In the best of all worlds we would have a module parameter which > would us to adjust this parameter, as I fear the actual number of retries > will depend on the number of devices connected. > > But if you fell that's overkill it's fine by me, too. A module parameter probably is overkill. Once we get beyond 5, that's the usual retry limit for ordinary commands, so even if we survive beyond READ_CAPACITY, we'll begin failing in the actual operational commands like reads and writes (beginning with partition size). > > How about this instead? > > > Yes, that's better. Thanks. OK, will put it in with your ack then. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html