Re: PM-runtime: supplier looses track of consumer during probe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 2, 2022 at 1:23 PM Tushar Nimkar <quic_tnimkar@xxxxxxxxxxx> wrote:
>
> Thanks Adrian and Rafael,
> We are trying both patches separately. And will update result once we get.

Thank you!

I'm going to submit the change in rpm_idle() regardless of whether or
not it is sufficient to address the issue, because it is a clear
mistake to still call __rpm_callback() from there after adding the
handling of device links to it.

> On 12/2/2022 1:14 AM, Rafael J. Wysocki wrote:
> > On Thursday, December 1, 2022 8:28:25 PM CET Rafael J. Wysocki wrote:
> >> On Thu, Dec 1, 2022 at 2:10 PM Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote:
> >>>
> >>> On 29/11/22 18:56, Nitin Rawat wrote:
> >>>> Hi Adrian,
> >>>>
> >>>> On 11/21/2022 11:38 AM, Tushar Nimkar wrote:
> >>>>> Hi Adrian,
> >>>>>
> >>>>> On 11/18/2022 8:25 PM, Adrian Hunter wrote:
> >>>>>> On 4/11/22 11:19, Tushar Nimkar wrote:
> >>>>>>> Hi linux-pm/linux-scsi,
> >>>>>
> >>>>>>>> Process -1
> >>>>>>>> ufshcd_async_scan context (process 1)
> >>>>>>>> scsi_autopm_put_device() //0:0:0:0
> >>>>>>
> >>>>>> I am having trouble following your description.  What function is calling
> >>>>>> scsi_autopm_put_device() here?
> >>>>>>
> >>>>> Below is flow which calls scsi_autopm_put_device()
> >>>>> Process -1
> >>>>> ufshcd_async_scan()
> >>>>>       scsi_probe_and_add_lun()
> >>>>>           scsi_add_lun()
> >>>>>               slave_configure()
> >>>>>                   scsi_sysfs_add_sdev()
> >>>>>                       scsi_autopm_get_device()
> >>>>>                           device_add()     <- invoked [Process 2] sd_probe()
> >>>>>                               scsi_autopm_put_device()
> >>>>>
> >>>>>>>> pm_runtime_put_sync()
> >>>>>>>> __pm_runtime_idle()
> >>>>>>>> rpm_idle() -- RPM_GET_PUT(4)
> >>>>>>>>        __rpm_callback
> >>>>>>>>            scsi_runtime_idle()
> >>>>>>>>                pm_runtime_mark_last_busy()
> >>>>>>>>                pm_runtime_autosuspend()  --[A]
> >>>>>>>>                    rpm_suspend() -- RPM_AUTO(8)
> >>>>>>>>                        pm_runtime_autosuspend_expiration() use_autosuspend    is false return 0   --- [B]
> >>>>>>>>                            __update_runtime_status to RPM_SUSPENDING
> >>>>>>>>                        __rpm_callback()
> >>>>>>>>                            __rpm_put_suppliers(dev, false)
> >>>>>>>>                        __update_runtime_status to RPM_SUSPENDED
> >>>>>>>>                    rpm_suspend_suppliers()
> >>>>>>>>                        rpm_idle() for supplier -- RPM_ASYNC(1) return (-EAGAIN) [ Other consumer active for supplier]
> >>>>>>>>                    rpm_suspend() – END with return=0
> >>>>>>>>            scsi_runtime_idle() END return (-EBUSY) always.
> >>>>>>
> >>>>>> Not following here either.  Which device is EBUSY and why?
> >>>>>
> >>>>> scsi_runtime_idle() return -EBUSY always [3]
> >>>>> Storage/scsi team can better explain -EBUSY implementation.
> >>>>
> >>>> EBUSY is returned from below code for consumer dev 0:0:0:0.
> >>>> scsi_runtime_idle is called from scsi_autopm_put_device which inturn is called from ufshcd_async_scan (Process 1 as per above call stack)
> >>>> static int scsi_runtime_idle(struct device *dev)
> >>>> {
> >>>>      :
> >>>>
> >>>>      if (scsi_is_sdev_device(dev)) {
> >>>>          pm_runtime_mark_last_busy(dev);
> >>>>          pm_runtime_autosuspend(dev);
> >>>>          return -EBUSY; ---> EBUSY returned from here.
> >>>>      }
> >>>>
> >>>>
> >>>> }
> >>>>
> >>>>>
> >>>>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/scsi/scsi_pm.c?h=next-20221118#n210
> >>>>>
> >>>>>
> >>>>>>>>
> >>>>>>>> [1]: https://lore.kernel.org/lkml/4748074.GXAFRqVoOG@kreacher/T/
> >>>>>>>> [2]: https://lkml.org/lkml/2022/10/12/259
> >>>
> >>> It looks to me like __rpm_callback() makes assumptions about
> >>> dev->power.runtime_status that are not necessarily true because
> >>> dev->power.lock is dropped.
> >>
> >> Well, this happens because rpm_idle() calls __rpm_callback() and
> >> allows it to run concurrently with rpm_suspend() and rpm_resume(), so
> >> one of them may change runtime_status to RPM_SUSPENDING or
> >> RPM_RESUMING while __rpm_callback() is running.
> >>
> >> It is somewhat questionable whether or not this should be allowed to
> >> happen, but since it is generally allowed to suspend the device from
> >> its .runtime_idle callback, there is not too much that can be done
> >> about it.
> >
> > But this means that the patch below should help too.
> >
> > I actually think that we can do both, because rpm_idle() doesn't have to do
> > the whole device links dance and the fact that it still calls __rpm_callback()
> > is a clear oversight.
> >
> > ---
> >   drivers/base/power/runtime.c |   12 +++++++++++-
> >   1 file changed, 11 insertions(+), 1 deletion(-)
> >
> > Index: linux-pm/drivers/base/power/runtime.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/power/runtime.c
> > +++ linux-pm/drivers/base/power/runtime.c
> > @@ -484,7 +484,17 @@ static int rpm_idle(struct device *dev,
> >
> >       dev->power.idle_notification = true;
> >
> > -     retval = __rpm_callback(callback, dev);
> > +     if (dev->power.irq_safe)
> > +             spin_unlock(&dev->power.lock);
> > +     else
> > +             spin_unlock_irq(&dev->power.lock);
> > +
> > +     retval = callback(dev);
> > +
> > +     if (dev->power.irq_safe)
> > +             spin_lock(&dev->power.lock);
> > +     else
> > +             spin_lock_irq(&dev->power.lock);
> >
> >       dev->power.idle_notification = false;
> >       wake_up_all(&dev->power.wait_queue);
> >
> >
> >



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux