On Thu, Jun 15, 2023 at 10:50 PM Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote: > > On Thu, Jun 15, 2023 at 05:33:26PM +0900, Damien Le Moal wrote: > > When an ATA port is resumed from sleep, the port is reset and a power > > management request issued to libata EH to reset the port and rescanning > > the device(s) attached to the port. Device rescanning is done by > > scheduling an ata_scsi_dev_rescan() work, which will execute > > scsi_rescan_device(). > > > > However, scsi_rescan_device() takes the generic device lock, which is > > also taken by dpm_resume() when the SCSI device is resumed as well. If > > a device rescan execution starts before the completion of the SCSI > > device resume, the rcu locking used to refresh the cached VPD pages of > > the device, combined with the generic device locking from > > scsi_rescan_device() and from dpm_resume() can cause a deadlock. > > > > Avoid this situation by changing struct ata_port scsi_rescan_task to be > > a delayed work instead of a simple work_struct. ata_scsi_dev_rescan() is > > modified to check if the SCSI device associated with the ATA device that > > must be rescanned is not suspended. If the SCSI device is still > > suspended, ata_scsi_dev_rescan() returns early and reschedule itself for > > execution after an arbitrary delay of 5ms. > > I don't understand the nature of the relationship between the ATA port > and the corresponding SCSI device. Maybe you could explain it more > fully, if you have time. > > But in any case, this approach seems like a layering violation. Why not > instead call a SCSI utility routine to set a "needs_rescan" flag in the > scsi_device structure? Then scsi_device_resume() could automatically > call scsi_rescan_device() -- or rather an internal version that assumes > the device lock is already held -- if the flag is set. Or it could > queue a non-delayed work routine to do this. (Is it important to have > the rescan finish before userspace starts up and tries to access the ATA > device again?) > > That, combined with a guaranteed order of resuming, would do what you > want, right? What you are suggesting is pretty much like my previous approach: https://lore.kernel.org/all/20230502150435.423770-2-kai.heng.feng@xxxxxxxxxxxxx/ Kai-Heng > > Alan Stern