Re: [PATCH] scsi: scsi_scan purge devices no longer in reported LUN list

Brian Bunker <brian@xxxxxxxxxxxxxxx> · Thu, 6 Jul 2023 09:58:56 -0700

> On Jul 29, 2022, at 4:38 PM, Uday Shankar <ushankar@xxxxxxxxxxxxxxx> wrote:
> 
> Hannes, I understand that Brian reached out to you for feedback on this
> patch. I still have doubts I'd like to clarify; I quote portions of
> your response below.
> 
>> Biggest problem is that we currently cannot 'reload' an existing SCSI
>> device, as the inquiry data is fixed.
> 
> I agree; scsi_probe_and_add_lun called with rescan == SCSI_SCAN_MANUAL
> on a LUN for which we already have a struct scsi_device seems to be
> essentially no-op. scsi_rescan_device will update VPD, but not other
> inquiry data.
> 
>> So if we come across things like EMC Clariion which changes the
>> inquiry data for LUN0 when mapping devices to the host we don't have
>> any other choice but to physically remove the device and rescan it
>> again. Which is okay if you run under multipath, but for directly
>> accessed devices it'll kill your machine :-(
> 
> I don't understand how a "reload" will help in this scenario. I don't
> know the specifics of the EMC Clariion behavior, but based on your
> description and what I've read in the driver code I assume the device
> changes the PDT/PQ fields in the LUN 0 inquiry depending on whether or
> not there is storage attached to it. There are two "transitions:"
> 
> Attaching storage to LUN 0: We don't save a struct scsi_device for
> devices whose PDT/PQ indicates "no storage attached," so when storage
> gets attached and PDT/PQ changes, scsi_probe_and_add_lun will act as if
> its seeing a new device for the first time. Everything should work.
> 
> Detaching storage from LUN 0: The current implementation of target scan
> won't pick up the updated inquiry data, sure, but a "reload" can't save
> your machine from dying if programs were accessing the LUN 0 volume
> directly, can it? Regardless of what the host does, the fact remains
> that it can no longer do I/O on the LUN 0 volume. The only thing the
> host can control is the particular flavor of errors delivered to these
> programs, and the one associated to "device is gone" seems to be most
> accurate, and the one that Brian's patch (if it applied to all devices,
> not just those with vendor PURE) would deliver.
> 
> Overall: we'd like to eliminate the need for manual rescans wherever
> possible, and we're willing to revise the patch and/or submit patches
> elsewhere as needed to achieve that goal. Please advise.
> 
> Thanks,
> Uday

Recently I came across this from RedHat which shows that the open source
provisioning applications have similarly run into the same issues with the
orphaned devices being re-used by different volumes. In light of this, in a
more dynamic world where connections and disconnections will be more
common, does this change the idea around device purging in the kernel
for LUN ID’s not returned in the reported LUN list? It seems like each of
these provisioning tools will hit this if they don’t account specifically for it.

https://access.redhat.com/solutions/7012184

Thanks,
Brian