On 3/30/22 5:59 AM, Wenchao Hao wrote: > On 2022/3/30 17:32, Hannes Reinecke wrote: >> On 3/30/22 11:11, Wenchao Hao wrote: >>> On 2022/3/30 2:56, Hannes Reinecke wrote: >>>> On 3/29/22 14:40, Wenchao Hao wrote: >>>>> On 2022/3/29 18:56, Steffen Maier wrote: >>>>>> On 3/29/22 11:06, Wenchao Hao wrote: >>>>>>> SCSI timeout would call scsi_eh_scmd_add() on some conditions, host would be set >>>>>>> to SHOST_RECOVERY state. Once host enter SHOST_RECOVERY, IOs submitted to all >>>>>>> devices in this host would not succeed until the scsi_error_handler() finished. >>>>>>> The scsi_error_handler() might takes long time to be done, it's unbearable when >>>>>>> host has massive devices. >>>>>>> >>>>>>> I want to ask is anyone applying another error handler flow to address this >>>>>>> phenomenon? >>>>>>> >>>>>>> I think we can move some operations(like scsi get sense, scsi send startunit >>>>>>> and scsi device reset) out of scsi_unjam_host(), to perform these operations >>>>>>> without setting host to SHOST_RECOVERY? It would reduce the time of block the >>>>>>> whole host. >>>>>>> >>>>>>> Waiting for your discussion. >>>>>> >>>>>> We already have "async" aborts before even entering scsi_eh. So your use case seems to imply that those aborts fail and we enter scsi_eh? >>>>>> >>>>> >>>>> Yes, I mean when scsi_abort_command() failed and scsi_eh_scmd_add() is called. >>>>> >>>>>> There's eh_deadline for limiting the time spent in escalation of scsi_eh, and instead directly go to host reset. Would this help? >>>>>> >>>>>> >>>>> >>>>> The deadline seems not helpful. What we want to see is a single LUN's command error >>>>> would not stop other LUNs which share the same host. So my plan is to move reset LUN out >>>>> from scsi_unjam_host() which run with host set to SHOST_RECOVERY. >>>> >>>> Nope. One of the key points of scsi_unjam_host() is that is has to stop all I/O before proceeding. Without doing so basically all SCSI parallel HBAs will fail EH as they _require_ I/O to be stopped. >>>> >>> >>> I still can not understand why we must stop all I/O. In my comprehension, stopping all I/O >>> is because we might reset host during scsi_error_handler() and we must wait host's number of >>> failed command equal to number of busy command then we can wake up scsi_error_handler(). >>> >>> If move reset LUN out of scsi_error_handler(), and perform single LUN reset, we only need >>> stop I/O of this single LUN, this would not affect other LUNs. If single LUN reset failed, >>> we can then call in the large scale error handle. >>> >> I know the EH flow. >> >> Problem here is the way parallel SCSI operates. Remember, parallel SCSI is a _bus_, and there can be only one command at a time on the bus. >> So if one command on the bus misfires and you have to start EH you have to stop all I/O on the bus to ensure that your EH command is the only one active on the bus. >> > > Thank you for you explanation, it's clear to me now. > >> For modern HBAs we sure can device other ways and means of error recovery, but I can't really see how we would do that on legacy HBAs. >> > > How about define a new return value of scsi_host_template's eh_timed_out callback which indicate this timeout > is totally handled by LLDs. Like following: > > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -359,6 +359,8 @@ enum blk_eh_timer_return scsi_times_out(struct request *req) > set_host_byte(scmd, DID_TIME_OUT); > scsi_eh_scmd_add(scmd); > } > + } else if (rtn == EH_HANDLED_BY_DRIVERS) { > + return BLK_EH_DONE; > } > > Or scsi_host_template's eh_timed_out should not do this, we can define another callback? > In the LLDs's timeout handler callback, apply single LUN reset first flow as previous mail metioned. > You probably want to add a scsi_host_template field or new callouts because some drivers only preallocate enough resources for one TMF at a time so we have to do it on a driver by driver basis. For driver's setting/implementing it then to escalate you can do something like: 1. Block the queue for just the LU with scsi_internal_device_block. 2. You can then call the new callout or old one if we just added some new field. 3. If device/lun reset fails, then block the target. Then you can do the host. We could share code with scsi_ioctl_reset as well. Drivers that support TMFs via that ioctl already expect queuecommand to be possibly in the middle of a run and IO not yet timed out. For example, the code to block a queue and reset the device could be used for the new EH and SG_SCSI_RESET_DEVICE handling. > Anyway, what we need is a way to reduce the time of setting host to SHOST_RECOVERY. > >>> Here is a brief flow: >>> >>> abort command >>> || >>> || failed >>> || >>> \/ >>> stop single LUN's I/O (need to wait LUN's failed command number equal to busy command number) >>> || >>> || failed (according to our statistic, 90% reset LUN would succeed) >>> || >>> \/ >> >> Interesting. This does not match up with my experience, where 99% of the errors were due to a command timeout. >> >> So which errors do you see here? What are the causes? > > These error statistic are from our consumers' environment,they told me about 90% timeout triggered errors can be > handled by reset LUN. > Is this with specific drivers, transport or targets?