Re: [PATCH] Waiting for scsi_host_template release

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sent out a new patch with "bsg referencing bus driver module" in subj.

On Wed, Apr 11, 2018 at 4:37 PM, Anatoliy Glagolev <glagolig@xxxxxxxxx> wrote:
> On "what was the actual error": it is deref of an invalid address, not
> NULL. Attaching crash dump analysis for the reference.
>
> On module reference count: good point. I decided against it at first,
> but I can reconsider. "modprobe -r qla2xxx" will fail if there is an
> extra reference to the module, and the module_exit function will not
> even run, right? Waiting for references to go away would be more
> convenient for me. But I can see why the module reference count is a
> better approach in general. I can work around and retry "modprobe -r
> qla2xxx" multiple times in my scripts.
>
> I think that it is still a SCSI mid-layer job to do the references.
> There is no way qla2xxx can reference itself and then dereference at
> the right time.
>
> qla2xxx (or any other driver) provides a pointer to its module in
> scsi_host_template when it requests Scsi_Host creation. As far as I
> can see, no one ever takes a reference on that module. SCSI mid-layer
> just relies on the module to be around. Scsi_Host is a device itself;
> that is the device that is referenced on open/close from user mode,
> and not the bus driver that triggered the Scsi_Host creation.
>
> SCSI mid layer taking a reference on the template's module at
> Scsi_Host creation in scsi_host_alloc(..) and dropping it in
> scsi_host_dev_release (called when the last reference to Scsi_Host is
> gone) will not work. Assuming that the module_exit function does not
> run at an attempt to unload a referenced module, qla2xxx's Scsi_Host-s
> corresponding to the adapter's ports will stay forever.
>
> Let me think more about it; the idea is to intercept open/close at
> Scsi_Host and increment/decrement module reference at that time.
>
> Thanks a lot for the input!
>
>
> On Wed, Apr 11, 2018 at 1:12 PM, James Bottomley
> <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
>> On Wed, 2018-04-11 at 12:22 -0700, Anatoliy Glagolev wrote:
>>> Hannes, James, thanks a lot for taking a look!
>>>
>>> On the problem the patch is solving: it is in the "Description" part
>>> of my initial e-mail. If you agree that a Scsi_Host may be around
>>> after a driver has unloaded, the problem applies to any driver
>>> creating a new Scsi_Host.
>>
>> No, I don't agree: as I said, the template is part of the module and
>> the module should be reference counted.  Any use after free of the
>> template means there's a refcounting bug somewhere.
>>
>>>  I fixed it in qla2xxx to illustrate the usage of the new function
>>> and scsi_host_template's flag; also, qla2xxx is where I actually
>>> observe crashes. Other drivers may do the same if they want to
>>> address the problem.
>>>
>>> Here are details on the qla2xxx crash repro, if that is what you were
>>> asking about. If I run "qaucli" utility that retrieves some info from
>>> the driver via SCSI mid-layer, and unload the driver in parallel, the
>>> kernel crashes with the following stack:
>>>
>>> [16834.636216,07] Call Trace:
>>>                                                           ...
>>> scsi_proc_hostdir_rm
>>> [16834.641944,07]  [<ffffffff8141723f>]
>>> scsi_host_dev_release+0x3f/0x130
>>> [16834.647740,07]  [<ffffffff813e4f82>] device_release+0x32/0xa0
>>> [16834.653423,07]  [<ffffffff812dc6c7>] kobject_cleanup+0x77/0x190
>>> [16834.659002,07]  [<ffffffff812dc585>] kobject_put+0x25/0x50
>>> [16834.664430,07]  [<ffffffff813e5277>] put_device+0x17/0x20
>>> [16834.669740,07]  [<ffffffff812d0334>]
>>> bsg_kref_release_function+0x24/0x30
>>> [16834.675007,07]  [<ffffffff812d14a6>] bsg_release+0x166/0x1d0
>>> [16834.680148,07]  [<ffffffff8119ba2b>] __fput+0xcb/0x1d0
>>> [16834.685156,07]  [<ffffffff8119bb6e>] ____fput+0xe/0x10
>>> [16834.690017,07]  [<ffffffff81077476>] task_work_run+0x86/0xb0
>>> [16834.694781,07]  [<ffffffff81057043>]
>>> exit_to_usermode_loop+0x6b/0x9a
>>> [16834.699466,07]  [<ffffffff81002875>]
>>> syscall_return_slowpath+0x55/0x60
>>> [16834.704110,07]  [<ffffffff8172d615>]
>>> int_ret_from_sys_call+0x25/0x9f
>>
>> This one's a bit baffling: open of the bsg device should have already
>> taken the module reference.  What was the actual error: NULL deref?
>>
>> The thing which is supposed to hold the module is the device open/close
>> which does scsi_device_put on sd_release ... unless this is some sort
>> of non-scsi device and qlogic forgot how to refcount?
>>
>>> On refcount for scsi_host_template: valid point, I did consider it.
>>> Existing drivers allocate scsi_host_template statically. We cannot
>>> change them all at once. So we have to allow 2 ways of allocating
>>> scsi_host_template: the dynamic one with refcounts and the static one
>>> for legacy driver support. That is kind of ugly, too. In addition,
>>> having a refcounted scsi_host_template after driver unload is
>>> confusing: the memory of scsi_host_template is OK, but any attempt to
>>> call a method from the template still causes a crash.
>>
>> No, the static template already is part of the module so it should be
>> refcounted as a module reference.
>>
>> James
>>



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux