Re: [PATCH] scsi: alua: fix the race between alua_bus_detach and alua_rtpg

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 9/17/20 11:00 PM, Ewan D. Milne wrote:
> On Tue, 2020-09-15 at 16:28 +0530, Jitendra Khasdev wrote:
>> This is patch to fix the race occurs between bus detach and alua_rtpg.
>>
>> It fluses the all pending workqueue in bus detach handler, so it can avoid
>> race between alua_bus_detach and alua_rtpg.
>>
>> Here is call trace where race got detected.
>>
>> multipathd call stack:
>> [exception RIP: native_queued_spin_lock_slowpath+100]
>> --- <NMI exception stack> ---
>> native_queued_spin_lock_slowpath at ffffffff89307f54
>> queued_spin_lock_slowpath at ffffffff89307c18
>> _raw_spin_lock_irq at ffffffff89bd797b
>> alua_bus_detach at ffffffff8984dcc8
>> scsi_dh_release_device at ffffffff8984b6f2
>> scsi_device_dev_release_usercontext at ffffffff89846edf
>> execute_in_process_context at ffffffff892c3e60
>> scsi_device_dev_release at ffffffff8984637c
>> device_release at ffffffff89800fbc
>> kobject_cleanup at ffffffff89bb1196
>> kobject_put at ffffffff89bb12ea
>> put_device at ffffffff89801283
>> scsi_device_put at ffffffff89838d5b
>> scsi_disk_put at ffffffffc051f650 [sd_mod]
>> sd_release at ffffffffc051f8a2 [sd_mod]
>> __blkdev_put at ffffffff8952c79e
>> blkdev_put at ffffffff8952c80c
>> blkdev_close at ffffffff8952c8b5
>> __fput at ffffffff894e55e6
>> ____fput at ffffffff894e57ee
>> task_work_run at ffffffff892c94dc
>> exit_to_usermode_loop at ffffffff89204b12
>> do_syscall_64 at ffffffff892044da
>> entry_SYSCALL_64_after_hwframe at ffffffff89c001b8
>>
>> kworker:
>> [exception RIP: alua_rtpg+2003]
>> account_entity_dequeue at ffffffff892e42c1
>> alua_rtpg_work at ffffffff8984f097
>> process_one_work at ffffffff892c4c29
>> worker_thread at ffffffff892c5a4f
>> kthread at ffffffff892cb135
>> ret_from_fork at ffffffff89c00354
>>
>> Signed-off-by: Jitendra Khasdev <jitendra.khasdev@xxxxxxxxxx>
>> ---
>>  drivers/scsi/device_handler/scsi_dh_alua.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
>> index f32da0c..024a752 100644
>> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
>> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
>> @@ -1144,6 +1144,9 @@ static void alua_bus_detach(struct scsi_device *sdev)
>>  	struct alua_dh_data *h = sdev->handler_data;
>>  	struct alua_port_group *pg;
>>  
>> +	sdev_printk(KERN_INFO, sdev, "%s: flushing workqueues\n", ALUA_DH_NAME);
>> +	flush_workqueue(kaluad_wq);
>> +
>>  	spin_lock(&h->pg_lock);
>>  	pg = rcu_dereference_protected(h->pg, lockdep_is_held(&h->pg_lock));
>>  	rcu_assign_pointer(h->pg, NULL);
> 
> I'm not sure this is the best solution.  The current code
> references h->sdev when the dh_list is traversed.  So it needs
> to remain valid.  Fixing it by flushing the workqueue to avoid
> the list traversal code running leaves open the possibility that
> future code alterations may expose this problem again.
> 
> -Ewan
> 
> 

I see your point, but as we are in detach handler and this code path only execute when device is being detached. So, before detaching, flush work-queue will take care of any current code references h->sdev where dh_list is being traversed.

IMO, I do not think it would create any problem for future code alterations. Or may be I am missing something over here, what could be possible scenario for that?

---
Jitendra



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux