Re: PM-runtime: supplier looses track of consumer during probe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 1, 2022 at 2:10 PM Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote:
>
> On 29/11/22 18:56, Nitin Rawat wrote:
> > Hi Adrian,
> >
> > On 11/21/2022 11:38 AM, Tushar Nimkar wrote:
> >> Hi Adrian,
> >>
> >> On 11/18/2022 8:25 PM, Adrian Hunter wrote:
> >>> On 4/11/22 11:19, Tushar Nimkar wrote:
> >>>> Hi linux-pm/linux-scsi,
> >>
> >>>>> Process -1
> >>>>> ufshcd_async_scan context (process 1)
> >>>>> scsi_autopm_put_device() //0:0:0:0
> >>>
> >>> I am having trouble following your description.  What function is calling
> >>> scsi_autopm_put_device() here?
> >>>
> >> Below is flow which calls scsi_autopm_put_device()
> >> Process -1
> >> ufshcd_async_scan()
> >>      scsi_probe_and_add_lun()
> >>          scsi_add_lun()
> >>              slave_configure()
> >>                  scsi_sysfs_add_sdev()
> >>                      scsi_autopm_get_device()
> >>                          device_add()     <- invoked [Process 2] sd_probe()
> >>                              scsi_autopm_put_device()
> >>
> >>>>> pm_runtime_put_sync()
> >>>>> __pm_runtime_idle()
> >>>>> rpm_idle() -- RPM_GET_PUT(4)
> >>>>>       __rpm_callback
> >>>>>           scsi_runtime_idle()
> >>>>>               pm_runtime_mark_last_busy()
> >>>>>               pm_runtime_autosuspend()  --[A]
> >>>>>                   rpm_suspend() -- RPM_AUTO(8)
> >>>>>                       pm_runtime_autosuspend_expiration() use_autosuspend    is false return 0   --- [B]
> >>>>>                           __update_runtime_status to RPM_SUSPENDING
> >>>>>                       __rpm_callback()
> >>>>>                           __rpm_put_suppliers(dev, false)
> >>>>>                       __update_runtime_status to RPM_SUSPENDED
> >>>>>                   rpm_suspend_suppliers()
> >>>>>                       rpm_idle() for supplier -- RPM_ASYNC(1) return (-EAGAIN) [ Other consumer active for supplier]
> >>>>>                   rpm_suspend() – END with return=0
> >>>>>           scsi_runtime_idle() END return (-EBUSY) always.
> >>>
> >>> Not following here either.  Which device is EBUSY and why?
> >>
> >> scsi_runtime_idle() return -EBUSY always [3]
> >> Storage/scsi team can better explain -EBUSY implementation.
> >
> > EBUSY is returned from below code for consumer dev 0:0:0:0.
> > scsi_runtime_idle is called from scsi_autopm_put_device which inturn is called from ufshcd_async_scan (Process 1 as per above call stack)
> > static int scsi_runtime_idle(struct device *dev)
> > {
> >     :
> >
> >     if (scsi_is_sdev_device(dev)) {
> >         pm_runtime_mark_last_busy(dev);
> >         pm_runtime_autosuspend(dev);
> >         return -EBUSY; ---> EBUSY returned from here.
> >     }
> >
> >
> > }
> >
> >>
> >> [3] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/scsi/scsi_pm.c?h=next-20221118#n210
> >>
> >>
> >>>>>
> >>>>> [1]: https://lore.kernel.org/lkml/4748074.GXAFRqVoOG@kreacher/T/
> >>>>> [2]: https://lkml.org/lkml/2022/10/12/259
>
> It looks to me like __rpm_callback() makes assumptions about
> dev->power.runtime_status that are not necessarily true because
> dev->power.lock is dropped.

Well, this happens because rpm_idle() calls __rpm_callback() and
allows it to run concurrently with rpm_suspend() and rpm_resume(), so
one of them may change runtime_status to RPM_SUSPENDING or
RPM_RESUMING while __rpm_callback() is running.

It is somewhat questionable whether or not this should be allowed to
happen, but since it is generally allowed to suspend the device from
its .runtime_idle callback, there is not too much that can be done
about it.

>  AFAICT the intention of the code would be fulfilled by instead using the status as it was before
> the lock was dropped.

That's correct, so the patch should help, but it also needs to remove
the comment stating that the runtime status cannot change when
__rpm_callback() is running, which is clearly incorrect.

> Consequently, perhaps you could try this:
>
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index b52049098d4e..3cf9abc3b2c2 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -365,6 +365,7 @@ static int __rpm_callback(int (*cb)(struct device *), struct device *dev)
>  {
>         int retval = 0, idx;
>         bool use_links = dev->power.links_count > 0;
> +       enum rpm_status runtime_status = dev->power.runtime_status;
>
>         if (dev->power.irq_safe) {
>                 spin_unlock(&dev->power.lock);
> @@ -378,7 +379,7 @@ static int __rpm_callback(int (*cb)(struct device *), struct device *dev)
>                  * routine returns, so it is safe to read the status outside of
>                  * the lock.
>                  */
> -               if (use_links && dev->power.runtime_status == RPM_RESUMING) {
> +               if (use_links && runtime_status == RPM_RESUMING) {
>                         idx = device_links_read_lock();
>
>                         retval = rpm_get_suppliers(dev);
> @@ -405,8 +406,8 @@ static int __rpm_callback(int (*cb)(struct device *), struct device *dev)
>                  * Do that if resume fails too.
>                  */
>                 if (use_links
> -                   && ((dev->power.runtime_status == RPM_SUSPENDING && !retval)
> -                   || (dev->power.runtime_status == RPM_RESUMING && retval))) {
> +                   && ((runtime_status == RPM_SUSPENDING && !retval)
> +                   || (runtime_status == RPM_RESUMING && retval))) {
>                         idx = device_links_read_lock();
>
>                         __rpm_put_suppliers(dev, false);
>
>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux