Re: [PATCH] scsi: take module reference during async scan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2020-09-07 1:46 p.m., James Bottomley wrote:
On Mon, 2020-09-07 at 17:47 +0200, Tomas Henzl wrote:
During an async scan the driver shost->hostt structures are used,
that may cause issues when the driver is removed at that time.
As protection take the module reference.

Can I just ask what issues?  Today, our module model is that
scsi_device_get() bumps the module refcount and therefore makes the
module ineligible to be removed.  scsi_host_get() doesn't do this
because the way the host model is supposed to be coded, we can call
remove at any time but the module won't get freed until the last put of
the host.  I can see we have a potential problem with
scsi_forget_host() racing with the async scan thread ... is that what
you see? What's supposed to happen is that scsi_device_get() starts
failing as soon as the module begins it's exit routine, so if a scan is
in progress, it can't add any new devices ... in theory this means that
the list is stable for scsi_forget_host(), so knowing how that
assumption is breaking would be useful.

James,
If you think it is bullet-proof try using CONFIG_DEBUG_TEST_DRIVER_REMOVE=y .
John Garry reported that:

 # insmod scsi_debug.ko

Gave errors like this:

[  140.115244] debugfs: Directory 'sde' with parent 'block' already present!
[  140.376426] debugfs: Directory 'sde' with parent 'block' already present!
[  140.420613] sd 3:0:0:0: [sde] tag#40 access beyond end of device
[ 140.426655] blk_update_request: I/O error, dev sde, sector 15984 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[  140.437319] sd 3:0:0:0: [sde] tag#41 access beyond end of device
[ 140.443368] blk_update_request: I/O error, dev sde, sector 15984 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
...

Which wasn't the scsi_debug driver directly as it doesn't use debugfs. So
I suspect something is rotten in the mid-level.

When I tried to replicate John's config I couldn't even boot my Ubuntu
20.04 based system (with a MKP kernel). Seemed to fail/lockup before any
kernel prints came out to the serial port (yes, still useful), perhaps in
initrd. I'm guessing another, non-SCSI module caused the lockup. So I
gave up and turned off that config setting.

Doug Gilbert





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux