Re: [PATCH] scsi: take module reference during async scan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/7/20 7:46 PM, James Bottomley wrote:
> On Mon, 2020-09-07 at 17:47 +0200, Tomas Henzl wrote:
>> During an async scan the driver shost->hostt structures are used,
>> that may cause issues when the driver is removed at that time.
>> As protection take the module reference.
> 
> Can I just ask what issues?  Today, our module model is that
> scsi_device_get() bumps the module refcount and therefore makes the
> module ineligible to be removed.  scsi_host_get() doesn't do this
> because the way the host model is supposed to be coded, we can call
> remove at any time but the module won't get freed until the last put of
> the host.  I can see we have a potential problem with
> scsi_forget_host() racing with the async scan thread ... is that what
> you see? What's supposed to happen is that scsi_device_get() starts
> failing as soon as the module begins it's exit routine, so if a scan is
> in progress, it can't add any new devices ... in theory this means that
> the list is stable for scsi_forget_host(), so knowing how that
> assumption is breaking would be useful.

I think that the problem is that async scan uses callbacks to the module
and when the module is being removed during scan it is not protected.

modprobe mpt3sas && rmmod  mpt3sas

[  370.031614] INFO: task rmmod:3120 blocked for more than 120 seconds.
[  370.037967]       Not tainted 4.18.0-193.el8.x86_64 #1
[  370.043105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  370.050931] rmmod           D    0  3120   2460 0x00004080
[  370.056414] Call Trace:
[  370.058889]  ? __schedule+0x24f/0x650
[  370.062554]  schedule+0x2f/0xa0
[  370.065738]  async_synchronize_cookie_domain+0xad/0x140
[  370.070983]  ? finish_wait+0x80/0x80
[  370.074580]  __x64_sys_delete_module+0x166/0x280
[  370.079198]  do_syscall_64+0x5b/0x1a0
[  370.082876]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[  370.087946] RIP: 0033:0x7f6de460a7db
[  370.091534] Code: Bad RIP value.
[  370.094777] RSP: 002b:00007ffe9971e798 EFLAGS: 00000206 ORIG_RAX:
00000000000000b0
[  370.102341] RAX: ffffffffffffffda RBX: 00005592370d37b0 RCX:
00007f6de460a7db
[  370.109481] RDX: 000000000000000a RSI: 0000000000000800 RDI:
00005592370d3818
[  370.116606] RBP: 0000000000000000 R08: 00007ffe9971d711 R09:
0000000000000000
[  370.123748] R10: 00007f6de467c8e0 R11: 0000000000000206 R12:
00007ffe9971e9c0
[  370.130888] R13: 00007ffe99720333 R14: 00005592370d32a0 R15:
00005592370d37b0


> 
> James
> 




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux