On 5/7/24 6:28 PM, Martin Wilck wrote: > From: Rajashekhar M A <rajs@xxxxxxxxxx> > > When a host is configured with a few LUNs and IO is running, > injecting FC faults repeatedly leads to path recovery problems. > The LUNs have 4 paths each and 3 of them come back active after > say an FC fault which makes two of the paths go down, instead of > all 4. This happens after several iterations of continuous FC faults. > > Reason here is that we're returning an I/O error whenever we're > encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE, > ASYMMETRIC ACCESS STATE TRANSITION) instead of retrying. > > mwilck: Moved this code to alua_check_sense() as suggested by > Mike Christie [1]. Evan Milne had raised the question whether pg->state > should be set to transitioning in the UA case [2]. I believe that doing > this is correct. SCSI_ACCESS_STATE_TRANSITIONING by itself doesn't cause > I/O errors. Our handler schedules an RTPG, which will only result in > an I/O error condition if the transitioning timeout expires. > > [1] https://lore.kernel.org/all/0bc96e82-fdda-4187-148d-5b34f81d4942@xxxxxxxxxx/ > [2] https://lore.kernel.org/all/CAGtn9r=kicnTDE2o7Gt5Y=yoidHYD7tG8XdMHEBJTBraVEoOCw@xxxxxxxxxxxxxx/ > > Signed-off-by: Hannes Reinecke <hare@xxxxxxx> > Signed-off-by: Martin Wilck <mwilck@xxxxxxxx> > > --- > Changes v2->v3: > - drop return value of alua_handle_state_transition() (Christoph Hellwig) > - handle UNIT ATTENTION in alua_tur(), too (Mike Christie) > - restore comment in alua_check_sense() (Damien Le Moal) > > --- > drivers/scsi/device_handler/scsi_dh_alua.c | 33 +++++++++++++++------- > 1 file changed, 23 insertions(+), 10 deletions(-) > > diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c > index a226dc1b65d7..c6408678e7c4 100644 > --- a/drivers/scsi/device_handler/scsi_dh_alua.c > +++ b/drivers/scsi/device_handler/scsi_dh_alua.c > @@ -414,28 +414,40 @@ static char print_alua_state(unsigned char state) > } > } > > -static enum scsi_disposition alua_check_sense(struct scsi_device *sdev, > - struct scsi_sense_hdr *sense_hdr) > +static void alua_handle_state_transition(struct scsi_device *sdev) > { > struct alua_dh_data *h = sdev->handler_data; > struct alua_port_group *pg; > > + rcu_read_lock(); > + pg = rcu_dereference(h->pg); > + if (pg) > + pg->state = SCSI_ACCESS_STATE_TRANSITIONING; > + rcu_read_unlock(); > + alua_check(sdev, false); > +} > + > +static enum scsi_disposition alua_check_sense(struct scsi_device *sdev, > + struct scsi_sense_hdr *sense_hdr) > +{ > switch (sense_hdr->sense_key) { > case NOT_READY: > - if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0a) { > + if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0a){ You removed the space before the curly bracket... With that fixed, feel free to add: Reviewed-by: Damien Le Moal <dlemoal@xxxxxxxxxx> > /* > * LUN Not Accessible - ALUA state transition > */ > - rcu_read_lock(); > - pg = rcu_dereference(h->pg); > - if (pg) > - pg->state = SCSI_ACCESS_STATE_TRANSITIONING; > - rcu_read_unlock(); > - alua_check(sdev, false); > + alua_handle_state_transition(sdev); > return NEEDS_RETRY; > } > break; > case UNIT_ATTENTION: > + if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0a) { > + /* > + * LUN Not Accessible - ALUA state transition > + */ > + alua_handle_state_transition(sdev); > + return NEEDS_RETRY; > + } > if (sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00) { > /* > * Power On, Reset, or Bus Device Reset. > @@ -502,7 +514,8 @@ static int alua_tur(struct scsi_device *sdev) > > retval = scsi_test_unit_ready(sdev, ALUA_FAILOVER_TIMEOUT * HZ, > ALUA_FAILOVER_RETRIES, &sense_hdr); > - if (sense_hdr.sense_key == NOT_READY && > + if ((sense_hdr.sense_key == NOT_READY || > + sense_hdr.sense_key == UNIT_ATTENTION) && > sense_hdr.asc == 0x04 && sense_hdr.ascq == 0x0a) > return SCSI_DH_RETRY; > else if (retval) -- Damien Le Moal Western Digital Research