Re: [PATCH 1/1] scsi: scsi_dh_alua: do not set h->sdev to NULL before removing the list entry

Hannes Reinecke <hare@xxxxxxx> · Wed, 23 Sep 2020 10:23:07 +0200

On 9/19/20 12:07 AM, Brian Bunker wrote:
Internally we had discussions about adding a synchronize_rcu here:

--- a/scsi/drivers/scsi/device_handler/scsi_dh_alua.c   2020-09-10 12:29:03.000000000 -0700
+++ b/scsi/drivers/scsi/device_handler/scsi_dh_alua.c   2020-09-18 14:59:59.000000000 -0700
@@ -658,8 +658,6 @@
                                         rcu_read_lock();
                                         list_for_each_entry_rcu(h,
                                                 &tmp_pg->dh_list, node) {
-                                               /* h->sdev should always be valid */
-                                               BUG_ON(!h->sdev);
                                                 h->sdev->access_state = desc[0];
                                         }
                                         rcu_read_unlock();
@@ -705,7 +703,6 @@
                         pg->expiry = 0;
                         rcu_read_lock();
                         list_for_each_entry_rcu(h, &pg->dh_list, node) {
-                               BUG_ON(!h->sdev);
                                 h->sdev->access_state =
                                         (pg->state & SCSI_ACCESS_STATE_MASK);
                                 if (pg->pref)
@@ -1147,7 +1144,6 @@
         spin_lock(&h->pg_lock);
         pg = rcu_dereference_protected(h->pg, lockdep_is_held(&h->pg_lock));
         rcu_assign_pointer(h->pg, NULL);
-       h->sdev = NULL;
         spin_unlock(&h->pg_lock);
         if (pg) {
                 spin_lock_irq(&pg->lock);
@@ -1156,6 +1152,7 @@
                 kref_put(&pg->kref, release_port_group);
         }
         sdev->handler_data = NULL;
+       synchronize_rcu();
         kfree(h);
  }

We were thinking that this would allow any running readers to finish before the object is freed. Would
that make sense there? From what I can tell the only RCU implementation in this kernel version in
scsi is here, so I couldn’t find many examples to follow.
  That actually looks quite viable (if the intendation gets fixed :-)
Initially I was thinking of using 'h->sdev = NULL' as a marker for the 
other code to figure out that the device has about to be deleted; that 
worked, but with rather unintended consequences.

Have you tried with your patch?

Cheers,

Hannes
--
Dr. Hannes Reinecke                Kernel Storage Architect
hare@xxxxxxx                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer