Re: [PATCH 1/1] scsi: scsi_dh_alua: do not set h->sdev to NULL before removing the list entry

"Ewan D. Milne" <emilne@xxxxxxxxxx> · Fri, 18 Sep 2020 14:41:31 -0400

On Fri, 2020-09-11 at 09:21 -0700, Brian Bunker wrote:
> A race exists where the BUG_ON(!h->sdev) will fire if the detach
> device handler
> from one thread runs removing a list entry while another thread is
> trying to
> evaluate the target portal group state.
> 
> Do not set the h->sdev to NULL in the detach device handler. It is
> freed at the
> end of the function any way. Also remove the BUG_ON since the
> condition
> that causes them to fire has been removed.
> 
> Signed-off-by: Brian Bunker <brian@xxxxxxxxxxxxxxx>
> Acked-by: Krishna Kant <krishna.kant@xxxxxxxxxxxxxxx>
> ___
> --- a/scsi/drivers/scsi/device_handler/scsi_dh_alua.c   2020-09-10
> 12:29:03.000000000 -0700
> +++ b/scsi/drivers/scsi/device_handler/scsi_dh_alua.c   2020-09-11
> 09:14:15.000000000 -0700
> @@ -658,8 +658,6 @@
>                                         rcu_read_lock();
>                                         list_for_each_entry_rcu(h,
>                                                 &tmp_pg->dh_list,
> node) {
> -                                               /* h->sdev should
> always be valid */
> -                                               BUG_ON(!h->sdev);
>                                                 h->sdev->access_state 
> = desc[0];
>                                         }
>                                         rcu_read_unlock();
> @@ -705,7 +703,6 @@
>                         pg->expiry = 0;
>                         rcu_read_lock();
>                         list_for_each_entry_rcu(h, &pg->dh_list,
> node) {
> -                               BUG_ON(!h->sdev);
>                                 h->sdev->access_state =
>                                         (pg->state &
> SCSI_ACCESS_STATE_MASK);
>                                 if (pg->pref)
> @@ -1147,7 +1144,6 @@
>         spin_lock(&h->pg_lock);
>         pg = rcu_dereference_protected(h->pg, lockdep_is_held(&h-
> >pg_lock));
>         rcu_assign_pointer(h->pg, NULL);
> -       h->sdev = NULL;
>         spin_unlock(&h->pg_lock);
>         if (pg) {
>                 spin_lock_irq(&pg->lock);
> 

We ran this change through fault insertion regression testing and
did not see any new problems.  (Our tests didn't hit the original
bug being fixed here though.)

-Ewan