[REQUEST DISCUSS]: speed up SCSI error handle for host with massive devices

Wenchao Hao <haowenchao@xxxxxxxxxx> · Tue, 29 Mar 2022 17:06:30 +0800

SCSI timeout would call scsi_eh_scmd_add() on some conditions, host 
would be set
to SHOST_RECOVERY state. Once host enter SHOST_RECOVERY, IOs submitted 
to all
devices in this host would not succeed until the scsi_error_handler() 
finished.
The scsi_error_handler() might takes long time to be done, it's 
unbearable when
host has massive devices.

I want to ask is anyone applying another error handler flow to address this
phenomenon?

I think we can move some operations(like scsi get sense, scsi send startunit
and scsi device reset) out of scsi_unjam_host(), to perform these operations
without setting host to SHOST_RECOVERY? It would reduce the time of 
block the
whole host.

Waiting for your discussion.