Re: [PATCH] ata: libata-scsi: Avoid deadlock on rescan after device resume

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 15, 2023 at 10:50 PM Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Thu, Jun 15, 2023 at 05:33:26PM +0900, Damien Le Moal wrote:
> > When an ATA port is resumed from sleep, the port is reset and a power
> > management request issued to libata EH to reset the port and rescanning
> > the device(s) attached to the port. Device rescanning is done by
> > scheduling an ata_scsi_dev_rescan() work, which will execute
> > scsi_rescan_device().
> >
> > However, scsi_rescan_device() takes the generic device lock, which is
> > also taken by dpm_resume() when the SCSI device is resumed as well. If
> > a device rescan execution starts before the completion of the SCSI
> > device resume, the rcu locking used to refresh the cached VPD pages of
> > the device, combined with the generic device locking from
> > scsi_rescan_device() and from dpm_resume() can cause a deadlock.
> >
> > Avoid this situation by changing struct ata_port scsi_rescan_task to be
> > a delayed work instead of a simple work_struct. ata_scsi_dev_rescan() is
> > modified to check if the SCSI device associated with the ATA device that
> > must be rescanned is not suspended. If the SCSI device is still
> > suspended, ata_scsi_dev_rescan() returns early and reschedule itself for
> > execution after an arbitrary delay of 5ms.
>
> I don't understand the nature of the relationship between the ATA port
> and the corresponding SCSI device.  Maybe you could explain it more
> fully, if you have time.
>
> But in any case, this approach seems like a layering violation.  Why not
> instead call a SCSI utility routine to set a "needs_rescan" flag in the
> scsi_device structure?  Then scsi_device_resume() could automatically
> call scsi_rescan_device() -- or rather an internal version that assumes
> the device lock is already held -- if the flag is set.  Or it could
> queue a non-delayed work routine to do this.  (Is it important to have
> the rescan finish before userspace starts up and tries to access the ATA
> device again?)
>
> That, combined with a guaranteed order of resuming, would do what you
> want, right?

What you are suggesting is pretty much like my previous approach:
https://lore.kernel.org/all/20230502150435.423770-2-kai.heng.feng@xxxxxxxxxxxxx/

Kai-Heng

>
> Alan Stern




[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux