On 2022/4/4 1:14, Mike Christie wrote: > On 3/30/22 5:59 AM, Wenchao Hao wrote: >> On 2022/3/30 17:32, Hannes Reinecke wrote: >>> On 3/30/22 11:11, Wenchao Hao wrote: >>>> On 2022/3/30 2:56, Hannes Reinecke wrote: >>>>> On 3/29/22 14:40, Wenchao Hao wrote: >>>>>> On 2022/3/29 18:56, Steffen Maier wrote: >>>>>>> On 3/29/22 11:06, Wenchao Hao wrote: >>>>>>>> SCSI timeout would call scsi_eh_scmd_add() on some conditions, host would be set >>>>>>>> to SHOST_RECOVERY state. Once host enter SHOST_RECOVERY, IOs submitted to all >>>>>>>> devices in this host would not succeed until the scsi_error_handler() finished. >>>>>>>> The scsi_error_handler() might takes long time to be done, it's unbearable when >>>>>>>> host has massive devices. >>>>>>>> >>>>>>>> I want to ask is anyone applying another error handler flow to address this >>>>>>>> phenomenon? >>>>>>>> >>>>>>>> I think we can move some operations(like scsi get sense, scsi send startunit >>>>>>>> and scsi device reset) out of scsi_unjam_host(), to perform these operations >>>>>>>> without setting host to SHOST_RECOVERY? It would reduce the time of block the >>>>>>>> whole host. >>>>>>>> >>>>>>>> Waiting for your discussion. >>>>>>> >>>>>>> We already have "async" aborts before even entering scsi_eh. So your use case seems to imply that those aborts fail and we enter scsi_eh? >>>>>>> >>>>>> >>>>>> Yes, I mean when scsi_abort_command() failed and scsi_eh_scmd_add() is called. >>>>>> >>>>>>> There's eh_deadline for limiting the time spent in escalation of scsi_eh, and instead directly go to host reset. Would this help? >>>>>>> >>>>>>> >>>>>> >>>>>> The deadline seems not helpful. What we want to see is a single LUN's command error >>>>>> would not stop other LUNs which share the same host. So my plan is to move reset LUN out >>>>>> from scsi_unjam_host() which run with host set to SHOST_RECOVERY. >>>>> >>>>> Nope. One of the key points of scsi_unjam_host() is that is has to stop all I/O before proceeding. Without doing so basically all SCSI parallel HBAs will fail EH as they _require_ I/O to be stopped. >>>>> >>>> >>>> I still can not understand why we must stop all I/O. In my comprehension, stopping all I/O >>>> is because we might reset host during scsi_error_handler() and we must wait host's number of >>>> failed command equal to number of busy command then we can wake up scsi_error_handler(). >>>> >>>> If move reset LUN out of scsi_error_handler(), and perform single LUN reset, we only need >>>> stop I/O of this single LUN, this would not affect other LUNs. If single LUN reset failed, >>>> we can then call in the large scale error handle. >>>> >>> I know the EH flow. >>> >>> Problem here is the way parallel SCSI operates. Remember, parallel SCSI is a _bus_, and there can be only one command at a time on the bus. >>> So if one command on the bus misfires and you have to start EH you have to stop all I/O on the bus to ensure that your EH command is the only one active on the bus. >>> >> >> Thank you for you explanation, it's clear to me now. >> >>> For modern HBAs we sure can device other ways and means of error recovery, but I can't really see how we would do that on legacy HBAs. >>> >> >> How about define a new return value of scsi_host_template's eh_timed_out callback which indicate this timeout >> is totally handled by LLDs. Like following: >> >> --- a/drivers/scsi/scsi_error.c >> +++ b/drivers/scsi/scsi_error.c >> @@ -359,6 +359,8 @@ enum blk_eh_timer_return scsi_times_out(struct request *req) >> set_host_byte(scmd, DID_TIME_OUT); >> scsi_eh_scmd_add(scmd); >> } >> + } else if (rtn == EH_HANDLED_BY_DRIVERS) { >> + return BLK_EH_DONE; >> } >> >> Or scsi_host_template's eh_timed_out should not do this, we can define another callback? >> In the LLDs's timeout handler callback, apply single LUN reset first flow as previous mail metioned. >> > > You probably want to add a scsi_host_template field or new callouts > because some drivers only preallocate enough resources for one TMF > at a time so we have to do it on a driver by driver basis. > > For driver's setting/implementing it then to escalate you can do something > like: > > 1. Block the queue for just the LU with scsi_internal_device_block. > 2. You can then call the new callout or old one if we just added some > new field. > 3. If device/lun reset fails, then block the target. Then you can do > the host. > Thanks a lot for your guide. I want to wait hannes's change. > We could share code with scsi_ioctl_reset as well. Drivers that support > TMFs via that ioctl already expect queuecommand to be possibly in the > middle of a run and IO not yet timed out. For example, the code to > block a queue and reset the device could be used for the new EH and > SG_SCSI_RESET_DEVICE handling. > > > >> Anyway, what we need is a way to reduce the time of setting host to SHOST_RECOVERY. >> >>>> Here is a brief flow: >>>> >>>> abort command >>>> || >>>> || failed >>>> || >>>> \/ >>>> stop single LUN's I/O (need to wait LUN's failed command number equal to busy command number) >>>> || >>>> || failed (according to our statistic, 90% reset LUN would succeed) >>>> || >>>> \/ >>> >>> Interesting. This does not match up with my experience, where 99% of the errors were due to a command timeout. >>> >>> So which errors do you see here? What are the causes? >> >> These error statistic are from our consumers' environment,they told me about 90% timeout triggered errors can be >> handled by reset LUN. >> > > Is this with specific drivers, transport or targets? > . > Yes, it's based on specific hardware. Details about the statistic seems secret which can not send out.