Re: [PATCH 00/10] scsi: Fix internal host code use

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/4/22 1:55 AM, Oliver Neukum wrote:
> 
> 
> On 04.08.22 05:40, Mike Christie wrote:
>> The following patches made over Martin's 5.20 staging branch fix an issue
>> where we probably intended the host codes:
>>
>> DID_TARGET_FAILURE
>> DID_NEXUS_FAILURE
>> DID_ALLOC_FAILURE
>> DID_MEDIUM_ERROR
>>
>> to be internal to scsi-ml, but at some point drivers started using them
>> and the driver writers never updated scsi-ml.
> 
> Hi,
> 
> this approach drops useful information, though. If a device
> reports such specific an error condition, why not use that
> information

Is there a specific patch/case/code you are concerned?

I think in most cases the drivers were not using the correct
error code or they were stretching in trying to find a code
already.

The only ones that I thought were questionable were:

1. storvsc_drv: Used DID_TARGET_FAILURE for a local allocation
failure when they wanted to handle lun removal/scanning from a
worker thread.

I don't think DID_TARGET_FAILURE is right here. The driver wants
to just not retry this command. It's not really a perm target
failure like DID_TARGET_FAILURE is documented as. The failure
is just that the driver can't allocate some mem to perform lun
management.

I think either:

1. When we hit that failure path that we want to keep the 
DID_NO_CONNECT/DID_REQUEUE and not overwrite them.

Or

2. I used DID_BAD_TARGET to try and keep the spirit of their
DID_TARGET_FAILURE use where we couldn't handle an operation on
it's behalf. So the target itself is not bad but our processing
for it was so I thought it was close enough.

Note that I think the root issue is that the driver should
not be handling UAs and doing LUN scanning/removal and should
have added code to scsi-ml so it can be handled for everyone.
So really that code should not exist but that is a larger
change. I didn't want to add a new error code because of this.

2. uas: Used DID_TARGET_FAILURE when a TMF was not supported.
Again I don't think that code was right because it's not
a perm target failure. It is something that we don't want to
retry on another path but I don't think that comes up for
this driver ever.

I think DID_BAD_TARGET is ok'ish for this one. It's not a bad
target, but the target doesn't support what we needed and
DID_BAD_TARGET still conveys what we wanted and gives us the
same behavior.

3. cxlflash: DID_ALLOC_FAILURE was wrong in this case because
they wanted a retryable error. DID_ALLOC_FAILURE was for when
we are doing provisioning and couldn't allocate space on the
device, and is not retrable.

DID_ERROR gives them the behavior they want. It does lose info
but that's just how drivers ask scsi-ml to retry errors we don't
have codes for. We could add a new code but I don't think it
was worth it since we don't do that for every other driver and
their retryable errors. If there are drivers that have the same
issue then I'm for adding a new code.






[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux