Hannes, James, thanks a lot for taking a look! On the problem the patch is solving: it is in the "Description" part of my initial e-mail. If you agree that a Scsi_Host may be around after a driver has unloaded, the problem applies to any driver creating a new Scsi_Host. I fixed it in qla2xxx to illustrate the usage of the new function and scsi_host_template's flag; also, qla2xxx is where I actually observe crashes. Other drivers may do the same if they want to address the problem. Here are details on the qla2xxx crash repro, if that is what you were asking about. If I run "qaucli" utility that retrieves some info from the driver via SCSI mid-layer, and unload the driver in parallel, the kernel crashes with the following stack: [16834.636216,07] Call Trace: ... scsi_proc_hostdir_rm [16834.641944,07] [<ffffffff8141723f>] scsi_host_dev_release+0x3f/0x130 [16834.647740,07] [<ffffffff813e4f82>] device_release+0x32/0xa0 [16834.653423,07] [<ffffffff812dc6c7>] kobject_cleanup+0x77/0x190 [16834.659002,07] [<ffffffff812dc585>] kobject_put+0x25/0x50 [16834.664430,07] [<ffffffff813e5277>] put_device+0x17/0x20 [16834.669740,07] [<ffffffff812d0334>] bsg_kref_release_function+0x24/0x30 [16834.675007,07] [<ffffffff812d14a6>] bsg_release+0x166/0x1d0 [16834.680148,07] [<ffffffff8119ba2b>] __fput+0xcb/0x1d0 [16834.685156,07] [<ffffffff8119bb6e>] ____fput+0xe/0x10 [16834.690017,07] [<ffffffff81077476>] task_work_run+0x86/0xb0 [16834.694781,07] [<ffffffff81057043>] exit_to_usermode_loop+0x6b/0x9a [16834.699466,07] [<ffffffff81002875>] syscall_return_slowpath+0x55/0x60 [16834.704110,07] [<ffffffff8172d615>] int_ret_from_sys_call+0x25/0x9f On refcount for scsi_host_template: valid point, I did consider it. Existing drivers allocate scsi_host_template statically. We cannot change them all at once. So we have to allow 2 ways of allocating scsi_host_template: the dynamic one with refcounts and the static one for legacy driver support. That is kind of ugly, too. In addition, having a refcounted scsi_host_template after driver unload is confusing: the memory of scsi_host_template is OK, but any attempt to call a method from the template still causes a crash. The advantage of this patch is low risk and relative simplicity. The patch does not change the behavior of existing drivers interacting with SCSI mid-layer unless they set the new flag and call the new function. Conceptually nicer, but more risky and more complex change is to fix scsi_remove_host. SCSI mid-layer must not not touch Scsi_Host's template after a driver calls scsi_remove_host for that Scsi_Host. On Wed, Apr 11, 2018 at 7:39 AM, James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > On Wed, 2018-04-11 at 16:11 +0200, Hannes Reinecke wrote: >> On Mon, 9 Apr 2018 23:23:51 -0700 >> Anatoliy Glagolev <glagolig@xxxxxxxxx> wrote: >> >> > Description: >> > SCSI mid-layer may hold references to Scsi_Host structs when >> > the owning module has already unloaded. Scsi_Host release path >> > touches scsi_host_template struct that is usually allocated >> > in the unloaded module's memory. That results in a crash. >> > To work around the problem, this change implements >> > scsi_host_template_release API to be called at driver unload >> > path to make sure all Scsi_Host structs are gone before >> > releasing scsi_host_template memory. >> > >> > --- >> > drivers/scsi/hosts.c | 2 ++ >> > drivers/scsi/qla2xxx/qla_os.c | 2 ++ >> > drivers/scsi/scsi_priv.h | 1 + >> > drivers/scsi/scsi_proc.c | 64 >> > +++++++++++++++++++++++++++++++++++++++---- >> > include/scsi/scsi_host.h | 17 ++++++++++++ 5 files changed, 80 >> > insertions(+), 6 deletions(-) >> > >> >> Whee, that is ugly. > > And what's the actual problem it's solving? It looks to be something > in qla2xxx module removal? > >> Any particular reason why we can't do refcounting here? > > We can ... the template is module data and any reference to the dev or > the host will increment the module reference. We could even have a > dummy template reference that only incremented the module refcount. > However, knowing what to do involves knowing what the problem is and > how it is triggered. > > James >