On 8/11/22 4:56 AM, Martin Wilck wrote: >> It could change how 0x04/0x0a is handled because it uses NEEDS_RETRY. >> However, scsi_dh_alua uses REQ_FAILFAST_DEV so we do not retry in >> scsi_noretry_cmd like before. > > Not quite following you here - alua_check_sense() is called for any > command, not just those submitted from the ALUA code. Ah, I thought you had mentioned alua above because of your alua_rtpg example. Above I was saying that there was no behavior change for your alua_rtpg example because it uses REQ_FAILFAST_DEV. >> 2. Instead of trying to make it general for all scsi_execute_users, >> we can >> add SCMD bits for specific cases like DID_TIME_OUT or a SCMD bit that >> tells >> scsi_noretry_cmd to not always fail passthrough commands just because >> they >> are passthrough. It would work the opposite of the FASTFAIL bits >> where instead >> of failing fast, we retry. >> >> I think because the cases scsi_noretry_cmd is used for are really >> specific cases >> (scsi_decide_disposition sees NEEDS_RETRY, retries < allowed, and >> REQ_FAILFAST_DEV >> is not set) that might not be very useful. > > I don't think it's _that_ speficic. (retries < allowed) is the default > case, at least for the first failure. REQ_FAILFAST_DEV has very few > users except for the device handlers, and NEEDS_RETRY is a rather > frequently used disposition. I'm saying it's really specific because we only hit this code path that is causing issues when scsi_check_sense returns NEEDS_RETRY. There's 5 in there and one in scsi_dh_alua. 4 of them are UAs. Compared to all the sense errors that we check for in the scsi_execute callers and including all the times they do a retry for all errors the 5 cases in scsi_check_sense seemed really specific. Let me send a patch for this type of design because in the other mail Christoph was asking for more details. I originally started going that route so it won't be too much trouble to do a RFC so we can get an idea of what it will look like.