Re: [PATCH] Waiting for scsi_host_template release

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hannes, James, thanks a lot for taking a look!

On the problem the patch is solving: it is in the "Description" part
of my initial e-mail. If you agree that a Scsi_Host may be around
after a driver has unloaded, the problem applies to any driver
creating a new Scsi_Host. I fixed it in qla2xxx to illustrate the
usage of the new function and scsi_host_template's flag; also, qla2xxx
is where I actually observe crashes. Other drivers may do the same if
they want to address the problem.

Here are details on the qla2xxx crash repro, if that is what you were
asking about. If I run "qaucli" utility that retrieves some info from
the driver via SCSI mid-layer, and unload the driver in parallel, the
kernel crashes with the following stack:

[16834.636216,07] Call Trace:
                                                          ...
scsi_proc_hostdir_rm
[16834.641944,07]  [<ffffffff8141723f>] scsi_host_dev_release+0x3f/0x130
[16834.647740,07]  [<ffffffff813e4f82>] device_release+0x32/0xa0
[16834.653423,07]  [<ffffffff812dc6c7>] kobject_cleanup+0x77/0x190
[16834.659002,07]  [<ffffffff812dc585>] kobject_put+0x25/0x50
[16834.664430,07]  [<ffffffff813e5277>] put_device+0x17/0x20
[16834.669740,07]  [<ffffffff812d0334>] bsg_kref_release_function+0x24/0x30
[16834.675007,07]  [<ffffffff812d14a6>] bsg_release+0x166/0x1d0
[16834.680148,07]  [<ffffffff8119ba2b>] __fput+0xcb/0x1d0
[16834.685156,07]  [<ffffffff8119bb6e>] ____fput+0xe/0x10
[16834.690017,07]  [<ffffffff81077476>] task_work_run+0x86/0xb0
[16834.694781,07]  [<ffffffff81057043>] exit_to_usermode_loop+0x6b/0x9a
[16834.699466,07]  [<ffffffff81002875>] syscall_return_slowpath+0x55/0x60
[16834.704110,07]  [<ffffffff8172d615>] int_ret_from_sys_call+0x25/0x9f

On refcount for scsi_host_template: valid point, I did consider it.
Existing drivers allocate scsi_host_template statically. We cannot
change them all at once. So we have to allow 2 ways of allocating
scsi_host_template: the dynamic one with refcounts and the static one
for legacy driver support. That is kind of ugly, too. In addition,
having a refcounted scsi_host_template after driver unload is
confusing: the memory of scsi_host_template is OK, but any attempt to
call a method from the template still causes a crash.

The advantage of this patch is low risk and relative simplicity. The
patch does not change the behavior of existing drivers interacting
with SCSI mid-layer unless they set the new flag and call the new
function.

Conceptually nicer, but more risky and more complex change is to fix
scsi_remove_host. SCSI mid-layer must not not touch Scsi_Host's
template after a driver calls scsi_remove_host for that Scsi_Host.


On Wed, Apr 11, 2018 at 7:39 AM, James Bottomley
<James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
> On Wed, 2018-04-11 at 16:11 +0200, Hannes Reinecke wrote:
>> On Mon, 9 Apr 2018 23:23:51 -0700
>> Anatoliy Glagolev <glagolig@xxxxxxxxx> wrote:
>>
>> > Description:
>> > SCSI mid-layer may hold references to Scsi_Host structs when
>> > the owning module has already unloaded. Scsi_Host release path
>> > touches scsi_host_template struct that is usually allocated
>> > in the unloaded module's memory. That results in a crash.
>> > To work around the problem, this change implements
>> > scsi_host_template_release API to be called at driver unload
>> > path to make sure all Scsi_Host structs are gone before
>> > releasing scsi_host_template memory.
>> >
>> > ---
>> >  drivers/scsi/hosts.c          |  2 ++
>> >  drivers/scsi/qla2xxx/qla_os.c |  2 ++
>> >  drivers/scsi/scsi_priv.h      |  1 +
>> >  drivers/scsi/scsi_proc.c      | 64
>> > +++++++++++++++++++++++++++++++++++++++----
>> > include/scsi/scsi_host.h      | 17 ++++++++++++ 5 files changed, 80
>> > insertions(+), 6 deletions(-)
>> >
>>
>> Whee, that is ugly.
>
> And what's the actual problem it's solving?  It looks to be something
> in qla2xxx module removal?
>
>> Any particular reason why we can't do refcounting here?
>
> We can ... the template is module data and any reference to the dev or
> the host will increment the module reference.  We could even have a
> dummy template reference that only incremented the module refcount.
> However, knowing what to do involves knowing what the problem is and
> how it is triggered.
>
> James
>



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux