On Tue, 2020-09-15 at 16:28 +0530, Jitendra Khasdev wrote: > This is patch to fix the race occurs between bus detach and alua_rtpg. > > It fluses the all pending workqueue in bus detach handler, so it can avoid > race between alua_bus_detach and alua_rtpg. > > Here is call trace where race got detected. > > multipathd call stack: > [exception RIP: native_queued_spin_lock_slowpath+100] > --- <NMI exception stack> --- > native_queued_spin_lock_slowpath at ffffffff89307f54 > queued_spin_lock_slowpath at ffffffff89307c18 > _raw_spin_lock_irq at ffffffff89bd797b > alua_bus_detach at ffffffff8984dcc8 > scsi_dh_release_device at ffffffff8984b6f2 > scsi_device_dev_release_usercontext at ffffffff89846edf > execute_in_process_context at ffffffff892c3e60 > scsi_device_dev_release at ffffffff8984637c > device_release at ffffffff89800fbc > kobject_cleanup at ffffffff89bb1196 > kobject_put at ffffffff89bb12ea > put_device at ffffffff89801283 > scsi_device_put at ffffffff89838d5b > scsi_disk_put at ffffffffc051f650 [sd_mod] > sd_release at ffffffffc051f8a2 [sd_mod] > __blkdev_put at ffffffff8952c79e > blkdev_put at ffffffff8952c80c > blkdev_close at ffffffff8952c8b5 > __fput at ffffffff894e55e6 > ____fput at ffffffff894e57ee > task_work_run at ffffffff892c94dc > exit_to_usermode_loop at ffffffff89204b12 > do_syscall_64 at ffffffff892044da > entry_SYSCALL_64_after_hwframe at ffffffff89c001b8 > > kworker: > [exception RIP: alua_rtpg+2003] > account_entity_dequeue at ffffffff892e42c1 > alua_rtpg_work at ffffffff8984f097 > process_one_work at ffffffff892c4c29 > worker_thread at ffffffff892c5a4f > kthread at ffffffff892cb135 > ret_from_fork at ffffffff89c00354 > > Signed-off-by: Jitendra Khasdev <jitendra.khasdev@xxxxxxxxxx> > --- > drivers/scsi/device_handler/scsi_dh_alua.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c > index f32da0c..024a752 100644 > --- a/drivers/scsi/device_handler/scsi_dh_alua.c > +++ b/drivers/scsi/device_handler/scsi_dh_alua.c > @@ -1144,6 +1144,9 @@ static void alua_bus_detach(struct scsi_device *sdev) > struct alua_dh_data *h = sdev->handler_data; > struct alua_port_group *pg; > > + sdev_printk(KERN_INFO, sdev, "%s: flushing workqueues\n", ALUA_DH_NAME); > + flush_workqueue(kaluad_wq); > + > spin_lock(&h->pg_lock); > pg = rcu_dereference_protected(h->pg, lockdep_is_held(&h->pg_lock)); > rcu_assign_pointer(h->pg, NULL); I'm not sure this is the best solution. The current code references h->sdev when the dh_list is traversed. So it needs to remain valid. Fixing it by flushing the workqueue to avoid the list traversal code running leaves open the possibility that future code alterations may expose this problem again. -Ewan