Re: [PATCH v2 0/8] scsi: Support to handle Intermittent errors

James Smart <jsmart2021@xxxxxxxxx> · Fri, 2 Oct 2020 10:27:48 -0700

On 10/2/2020 10:01 AM, Mike Christie wrote:
On 9/27/20 11:50 PM, Muneendra wrote:
This patch adds a support to prevent retries of all the pending/inflight
io's after an abort succeeds on a particular device when transport
connectivity to the device is encountering intermittent errors.

Intermittent connectivity is a condition that can be detected by transport
fabric notifications. A service can monitor the ELS notifications and
take action on all the outstanding io's of a scsi device at that instant.

Is the service mentioned above a new daemon or is it integrated into
something like multipathd?

What does the part about monitoring ELS notifications mean? Is the
service just doing something like a ELS ECHO, or is it able to watch
the IO on the wire/card (like if you did tcpdump and watched iscsi/tcp
traffic) or is it something completely different?

For the last part.... the FC drivers, when receiving FC FPIN ELS's are 
calling a scsi transport routine with the FPIN payload.  The transport 
is pushing this as an "event" via netlink.  An app bound to the local 
address used by the scsi transport can receive the event and parse it.

This is a new daemon, specific to FC, which monitors for FPIN events, 
parses the related topology devices, then interacts with sysfs and 
possibly multipath based on what it's seeing from the fabric.

-- james