Re: [PATCH 0/7] scsi: EH rework main part

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/4/22 19:27, chenxiang (M) wrote:
Hi Hannes and other guys,

For SCSI EH, i have a question (sorry, it is not related to this patchset): for current flow of SCSI EH, if IOs of one disk is failed

(if there are many disks under the same scsi host), it will block all the IOs of total scsi host.

So during SCSI EH, all IOs are blocked even if some disks are normal. That's the place product line sometimes complain about

as it blocks IO bussiness of some normal disks because of just one bad disk during SCSI EH.

Is it possible to split the SCSI EH into two parts, the process of recovering the disk and the process of recovering scsi host, at the beginning

If it were so easy.
The biggest problem we're facing in SCSI EH is that basically _all_ instances I've seen where EH got engaged were due to a command timeout.

Which means that we've sent a command to the HBA, and never heard from it again. Now, it were easy if it would just be the command which has vanished, but the problem is that we don't know what happened. It might be the command being ln transit, the drive might be unresponsive, or the HBA has gone off the rails altogether. So until we've established where the command got lost, we have to assume the worst and _have_ to treat the HBA as unreliable. So initially we shouldn't isolate the device, and hope the failure is restricted to the device. Instead we have to stop I/O to the HBA, establish communication (typically by sending a TMF), and only restart operations once we get a response back from the HBA.

This is especially true for old SCSI parallel HBA, where quite some state is being kept in the HBA structure itself. So if we were to send another command we would loas the state of the failed command, and wouldn't be able to figure out the root cause on why the command had failed.

Cheers,

Hannes
--
Dr. Hannes Reinecke                Kernel Storage Architect
hare@xxxxxxx                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux