Re: Looking for some help understanding error handling

Hannes Reinecke <hare@xxxxxxx> · Mon, 22 Oct 2018 17:21:39 +0200

On 10/22/18 10:17 AM, John Garry wrote:
On 19/10/2018 17:46, Chris.Moore@xxxxxxxxxxxxx wrote:

-----Original Message-----
From: linux-scsi-owner@xxxxxxxxxxxxxxx <linux-scsi-
owner@xxxxxxxxxxxxxxx> On Behalf Of John Garry
Sent: Friday, October 19, 2018 2:19 AM
To: Chris Moore - C33997 <Chris.Moore@xxxxxxxxxxxxx>; hare@xxxxxxx;
linux-scsi@xxxxxxxxxxxxxxx; Jason Yan <yanaijie@xxxxxxxxxx>
Subject: Re: Looking for some help understanding error handling

On 05/10/2018 16:51, Chris.Moore@xxxxxxxxxxxxx wrote:
Thanks Hannes,

After some pointers from Shane Seymour I found that the FC and SRP
transport layers have a devloss timer, so that when a device
disappears they hold on to the target information for a time waiting
to see if it comes back.  The SAS transport layer doesn't have that 
feature.

The options for me then would be to modify scsi_transport_sas.c to
implement the devloss timeout, or to put that functionality into my 
LLDD.

I'm willing to put the work into the SAS transport and libsas, but I
suspect there's not a universal need for it.  And since my LLDD is for
internal use at our company and won't be upstreamed, I'll probably
just do the work there.  If anyone feels that this is a feature that 
more
people would want then I'll look into doing that.

Hello,

This feature sounds interesting for libsas. I however have a question on
feasibility of devloss here (note: I'm not familiar with the 
concept/realization
for other standards): if a device is deattached and re-attached, how 
can we
confirm the same device? For SAS device it's ok as a disk has the 
WWN, but
what about SATA?

Thanks,
John

Would the serial number work?  I haven't worked a lot with SATA 
drives, but
ATA8-ACS says the IDENTIFY DEVICE response must contain a unique serial
number.

I guess in principle it would be possible. The issue is that libsas does 
not deal with topics like querying disks. It just deals with SAS layers 
below application layer, like link+port managament.

The underlying idea of the devloss mechanism is that the driver 
maintains a _stable_ relationship between transport endpoints (eg FC 
remote ports or SAS rports/rphy) and the target devices in the SCSI 
subsystem. _And_ it is assumed that the LUN enumeration within the 
target devices / endpoints remain stable during disconnect.
If these restrictions are met then the driver just has to reconnect the 
rport (of which he has the identification anyway) to the matching SCSI 
target device.

Of course, if we can't identify the rport properly (as would be the case 
with SATA devices) we'll have to check if this mechanism can be used at all.
(But for STP devices you probably can use the port index, hoping that 
that won't change ...)

Cheers,

Hannes