Re: [PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2017-03-17 at 16:40 +0000, Bart Van Assche wrote:
> On Fri, 2017-03-17 at 05:54 -0700, James Bottomley wrote:
> > So it's better to use the module without a reference in place and 
> > take the risk that it may exit and release its code area while 
> > we're calling it?
> 
> Hello James,
> 
> My understanding of scsi_device_get() / scsi_device_put() is that the 
> reason why these manipulate the module reference count is to avoid 
> that a SCSI LLD module can be unloaded while a SCSI device is being 
> used from a context that is not notified about SCSI LLD unloading 
> (e.g. a file handle controlled by the sd driver or a SCSI ALUA device
> handler worker thread).

Not just that: it's so the underlying module is pinned for every
potential user as well.  the unblock code is called in places outside
the actual hba driver module, so it needs that protection.

> Does your comment mean that you think there is a scenario in which
> scsi_target_block() or scsi_target_unblock() can be called while the 
> text area of a SCSI LLD is being released? I have reviewed all the 
> callers of these functions but I have not found such a scenario.
> scsi_target_block() and scsi_target_unblock() are either called from 
> a SCSI transport layer implementation (FC, iSCSI, SRP) or from a SCSI 
> LLD kernel module (snic_disc). All these kernel modules only call 
> scsi_target_*block() for resources (rport or SCSI target
> respectively) that are removed before the code area of
> these modules is released. This is why I think that calling
> scsi_target_*block() without increasing the SCSI LLD module reference
> count is safe.

The transport code is above the HBA module code and in that code
unblock could be racing with module removal.  The original premise was
that once the dev/target/host goes into DEL, nothing can call into
queuecommand or get a reference to the device, so nothing halts removal
after that, but you changed that with your code, which is why it's now
unsafe.

> > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> > index 82dfe07..fd1ba1d 100644
> > --- a/drivers/scsi/scsi_sysfs.c
> > +++ b/drivers/scsi/scsi_sysfs.c
> > @@ -39,6 +39,7 @@ static const struct {
> >  	{ SDEV_TRANSPORT_OFFLINE, "transport-offline" },
> >  	{ SDEV_BLOCK,	"blocked" },
> >  	{ SDEV_CREATED_BLOCK, "created-blocked" },
> > +	{ SDEV_CANCEL_BLOCK, "blocked" },
> >  };
> 
> The multipathd function path_offline() translates "blocked" into 
> PATH_PENDING. Shouldn't SDEV_CANCEL_BLOCK be translated by multipathd 
> into PATH_DOWN? There might be other user space applications that 
> interpret the SCSI device state and that I am not aware of.

Given it's very short lived, I don't think it much matters, but if you
think it does, that can become cancel.

> Additionally, your patch does not modify scsi_device_get() and hence
> will cause scsi_device_get() to succeed for devices that are in state
> SDEV_CANCEL_BLOCK. I think that's a subtle behavior change.

Yes, otherwise device unblock wouldn't work (on the off chance the
device is unblocked before we get to the sync cache).  Again, it's like
the open race: you could have got the reference just before the device
went into cancel and you're in the same position so there's actually no
substantive behaviour change at all, it just elongates the window where
you get a reference to a device you can't send commands to.

James


> Thanks,
> 
> Bart.




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux