On Thu, 2010-04-01 at 15:44 +0200, Hannes Reinecke wrote: > Hazard testing uncovered yet another bug in sd. Under heavy > reset activity the retry counter might be exhausted and > the command will be returned with sense UNIT_ATTENTION/0x29/00 > (POWER ON, RESET, OR BUS DEVICE RESET OCCURRED). In those > cases we should just increase the retry counter again, > retrying one more to clear up this Unit Attention state. > > Signed-off-by: Hannes Reinecke <hare@xxxxxxx> > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index 1962bea..7d75a21 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -1454,8 +1454,15 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp, > if (media_not_present(sdkp, &sshdr)) > return -ENODEV; > > - if (the_result) > + if (the_result) { > sense_valid = scsi_sense_valid(&sshdr); > + if (sense_valid && > + sshdr.sense_key == UNIT_ATTENTION && > + sshdr.asc = 0x29 && sshdr.asq == 0x00) ^^^^ should be == > + /* Device reset might occur several times, > + * give it one more chance */ > + retries++; > + } Firstly, not even compile checked: drivers/scsi/sd.c: In function ‘read_capacity_10’: drivers/scsi/sd.c:1558: error: ‘struct scsi_sense_hdr’ has no member named ‘asq’ Secondly, we can't quite do this. Some devices (only broken ones in my experience) will reply UNIT_ATTENTION I was RESET forever, leading to a loop here. Additionally, a massive reset storm on a shared bus would DoS the code here, so there must be a give up point after a reasonable number of retries. The third problem is that if this is happening to a large device, we only catch it in RC10 ... so we'll report undersize if the device is > SPC2 How about this instead? James --- diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 7b75c8a..cdb8ed6 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1432,6 +1432,8 @@ static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp, #error RC16_LEN must not be more than SD_BUF_SIZE #endif +#define READ_CAPACITY_RETRIES_ON_RESET 10 + static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp, unsigned char *buffer) { @@ -1439,7 +1441,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp, struct scsi_sense_hdr sshdr; int sense_valid = 0; int the_result; - int retries = 3; + int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET; unsigned int alignment; unsigned long long lba; unsigned sector_size; @@ -1468,6 +1470,13 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp, * Invalid Field in CDB, just retry * silently with RC10 */ return -EINVAL; + if (sense_valid && + sshdr.sense_key == UNIT_ATTENTION && + sshdr.asc == 0x29 && sshdr.ascq == 0x00) + /* Device reset might occur several times, + * give it one more chance */ + if (--reset_retries > 0) + continue; } retries--; @@ -1526,7 +1535,7 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp, struct scsi_sense_hdr sshdr; int sense_valid = 0; int the_result; - int retries = 3; + int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET; sector_t lba; unsigned sector_size; @@ -1542,8 +1551,16 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp, if (media_not_present(sdkp, &sshdr)) return -ENODEV; - if (the_result) + if (the_result) { sense_valid = scsi_sense_valid(&sshdr); + if (sense_valid && + sshdr.sense_key == UNIT_ATTENTION && + sshdr.asc == 0x29 && sshdr.ascq == 0x00) + /* Device reset might occur several times, + * give it one more chance */ + if (--reset_retries > 0) + continue; + } retries--; } while (the_result && retries); -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html