Re: [REQUEST DISCUSS]: speed up SCSI error handle for host with massive devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/29/22 14:40, Wenchao Hao wrote:
On 2022/3/29 18:56, Steffen Maier wrote:
On 3/29/22 11:06, Wenchao Hao wrote:
SCSI timeout would call scsi_eh_scmd_add() on some conditions, host would be set to SHOST_RECOVERY state. Once host enter SHOST_RECOVERY, IOs submitted to all devices in this host would not succeed until the scsi_error_handler() finished. The scsi_error_handler() might takes long time to be done, it's unbearable when
host has massive devices.

I want to ask is anyone applying another error handler flow to address this
phenomenon?

I think we can move some operations(like scsi get sense, scsi send startunit and scsi device reset) out of scsi_unjam_host(), to perform these operations without setting host to SHOST_RECOVERY? It would reduce the time of block the
whole host.

Waiting for your discussion.

We already have "async" aborts before even entering scsi_eh. So your use case seems to imply that those aborts fail and we enter scsi_eh?


Yes, I mean when scsi_abort_command() failed and scsi_eh_scmd_add() is called.

There's eh_deadline for limiting the time spent in escalation of scsi_eh, and instead directly go to host reset. Would this help?



The deadline seems not helpful. What we want to see is a single LUN's command error would not stop other LUNs which share the same host. So my plan is to move reset LUN out
from scsi_unjam_host() which run with host set to SHOST_RECOVERY.

Nope. One of the key points of scsi_unjam_host() is that is has to stop all I/O before proceeding. Without doing so basically all SCSI parallel HBAs will fail EH as they _require_ I/O to be stopped.

And even on modern HBAs we have the challenge that 99% of every EH invocation is triggered by command timeouts, where 'LUN reset' is only of limited usability.

Cheers,

Hannes



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux