On 8/4/22 1:55 AM, Oliver Neukum wrote: > > > On 04.08.22 05:40, Mike Christie wrote: >> The following patches made over Martin's 5.20 staging branch fix an issue >> where we probably intended the host codes: >> >> DID_TARGET_FAILURE >> DID_NEXUS_FAILURE >> DID_ALLOC_FAILURE >> DID_MEDIUM_ERROR >> >> to be internal to scsi-ml, but at some point drivers started using them >> and the driver writers never updated scsi-ml. > > Hi, > > this approach drops useful information, though. If a device > reports such specific an error condition, why not use that > information Is there a specific patch/case/code you are concerned? I think in most cases the drivers were not using the correct error code or they were stretching in trying to find a code already. The only ones that I thought were questionable were: 1. storvsc_drv: Used DID_TARGET_FAILURE for a local allocation failure when they wanted to handle lun removal/scanning from a worker thread. I don't think DID_TARGET_FAILURE is right here. The driver wants to just not retry this command. It's not really a perm target failure like DID_TARGET_FAILURE is documented as. The failure is just that the driver can't allocate some mem to perform lun management. I think either: 1. When we hit that failure path that we want to keep the DID_NO_CONNECT/DID_REQUEUE and not overwrite them. Or 2. I used DID_BAD_TARGET to try and keep the spirit of their DID_TARGET_FAILURE use where we couldn't handle an operation on it's behalf. So the target itself is not bad but our processing for it was so I thought it was close enough. Note that I think the root issue is that the driver should not be handling UAs and doing LUN scanning/removal and should have added code to scsi-ml so it can be handled for everyone. So really that code should not exist but that is a larger change. I didn't want to add a new error code because of this. 2. uas: Used DID_TARGET_FAILURE when a TMF was not supported. Again I don't think that code was right because it's not a perm target failure. It is something that we don't want to retry on another path but I don't think that comes up for this driver ever. I think DID_BAD_TARGET is ok'ish for this one. It's not a bad target, but the target doesn't support what we needed and DID_BAD_TARGET still conveys what we wanted and gives us the same behavior. 3. cxlflash: DID_ALLOC_FAILURE was wrong in this case because they wanted a retryable error. DID_ALLOC_FAILURE was for when we are doing provisioning and couldn't allocate space on the device, and is not retrable. DID_ERROR gives them the behavior they want. It does lose info but that's just how drivers ask scsi-ml to retry errors we don't have codes for. We could add a new code but I don't think it was worth it since we don't do that for every other driver and their retryable errors. If there are drivers that have the same issue then I'm for adding a new code.