On 8/7/24 8:46 PM, Yihang Li wrote:
When sending START_STOP commands to resume scsi_device, it may be interrupted by exception operations such as host reset or PCI FLR. Once the command of START_STOP is failed, the runtime_status of scsi device will be error and it is difficult for user to recover it.
How is the PCI FLR sent to the device? Shouldn't PCI FLRs only be triggered by the SCSI LLD from inside an error handler callback? How can a PCI FLR be triggered while a START STOP UNIT command is being processed? Why can PCI FLRs only be triggered while a START STOP UNIT command is being processed and not while any other command is being processed?
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 5cd88a8eea73..29f30407d713 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -4088,9 +4088,20 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start) { unsigned char cmd[6] = { START_STOP }; /* START_VALID */ struct scsi_sense_hdr sshdr; + struct scsi_failure failure_defs[] = { + { + .allowed = 3, + .result = SCMD_FAILURE_RESULT_ANY, + }, + {} + }; + struct scsi_failures failures = { + .failure_definitions = failure_defs, + }; const struct scsi_exec_args exec_args = { .sshdr = &sshdr, .req_flags = BLK_MQ_REQ_PM, + .failures = &failures, }; struct scsi_device *sdp = sdkp->device; int res;
The above change makes the START STOP UNIT command to be retried unconditionally. A START STOP UNIT command should not be retried unconditionally. Please take a look at the following patch series (posted yesterday): https://lore.kernel.org/linux-scsi/20240807203215.2439244-1-bvanassche@xxxxxxx/ Thanks, Bart.