On Thu, 2020-09-10 at 14:22 -0700, Brian Bunker wrote: > A race exists where the BUG_ON(!h->sdev) will fire if the detach > device handler > from one thread runs removing a list entry while another thread is > trying to > evaluate the target portal group state. > > The order of the detach operation is now changed to delete the list > entry > before modifying the pointer and setting h->sdev to NULL. > > Signed-off-by: Brian Bunker <brian@xxxxxxxxxxxxxxx> > Acked-by: Krishna Kant <krishna.kant@xxxxxxxxxxxxxxx> > ___ > diff -Naur a/scsi/drivers/scsi/device_handler/scsi_dh_alua.c > b/scsi/drivers/scsi/device_handler/scsi_dh_alua.c > --- a/scsi/drivers/scsi/device_handler/scsi_dh_alua.c 2020-09-10 > 12:29:03.000000000 -0700 > +++ b/scsi/drivers/scsi/device_handler/scsi_dh_alua.c 2020-09-10 > 12:41:34.000000000 -0700 > @@ -1146,16 +1146,18 @@ > > spin_lock(&h->pg_lock); > pg = rcu_dereference_protected(h->pg, lockdep_is_held(&h- > >pg_lock)); > - rcu_assign_pointer(h->pg, NULL); > - h->sdev = NULL; > - spin_unlock(&h->pg_lock); > if (pg) { > spin_lock_irq(&pg->lock); > list_del_rcu(&h->node); > spin_unlock_irq(&pg->lock); > - kref_put(&pg->kref, release_port_group); > } > + rcu_assign_pointer(h->pg, NULL); > + h->sdev = NULL; > + spin_unlock(&h->pg_lock); > sdev->handler_data = NULL; > + if (pg) { > + kref_put(&pg->kref, release_port_group); > + } > kfree(h); > } > Good catch. This makes the code hold the h->pg_lock while holding the pg->lock though. It seems like all that is needed is to remove the h->sdev = NULL assignment, since it has to remain valid until the alua_ah_data is removed from the pg->dh_list, and the object is going to be freed right afterwards anyway? Might as well also remove the BUG_ON(!h->sdev) in 2 places since the kernel will crash when h is dereferenced anyway if it is NULL. -Ewan