On Fri, 2017-05-12 at 15:15 +0200, Martin Wilck wrote: > alua_rtpg() can race with alua_bus_detach(). The assertion that > alua_dh_data *h->sdev must be non-NULL is not guaranteed because > alua_bus_detach sets this field to NULL before removing the entry > from the port group's dh_list. > > This happens when a device is about to be removed, so don't BUG out > but continue silently. > > Signed-off-by: Martin Wilck <mwilck@xxxxxxxx> > Reviewed-by: Hannes Reinecke <hare@xxxxxxxx> > --- > drivers/scsi/device_handler/scsi_dh_alua.c | 20 ++++++++++++++------ > 1 file changed, 14 insertions(+), 6 deletions(-) > > diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c > index 2b60f493f90e..a59783020c66 100644 > --- a/drivers/scsi/device_handler/scsi_dh_alua.c > +++ b/drivers/scsi/device_handler/scsi_dh_alua.c > @@ -652,9 +652,15 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg) > rcu_read_lock(); > list_for_each_entry_rcu(h, > &tmp_pg->dh_list, node) { > - /* h->sdev should always be valid */ > - BUG_ON(!h->sdev); > - h->sdev->access_state = desc[0]; > + /* > + * We might be racing with > + * alua_bus_detach here > + */ > + struct scsi_device *sdev = > + h->sdev; > + if (sdev) > + sdev->access_state = > + desc[0]; > } > rcu_read_unlock(); > } > @@ -694,11 +700,13 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg) > pg->expiry = 0; > rcu_read_lock(); > list_for_each_entry_rcu(h, &pg->dh_list, node) { > - BUG_ON(!h->sdev); > - h->sdev->access_state = > + struct scsi_device *sdev = h->sdev; > + if (!sdev) > + continue; > + sdev->access_state = > (pg->state & SCSI_ACCESS_STATE_MASK); > if (pg->pref) > - h->sdev->access_state |= > + sdev->access_state |= > SCSI_ACCESS_STATE_PREFERRED; > } > rcu_read_unlock(); Hello Martin, Allowing races like the one this patch tries to address to exist makes the ALUA code harder to maintain than necessary. Have you considered to make alua_bus_detach() wait until ALUA work has finished by using e.g. cancel_work_sync() or rcu_synchronize()? Bart.