Re: [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 08, 2017 at 01:41:06PM +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 8, 2017 1:31:22 PM CET Ville Syrjälä wrote:
> > On Wed, Nov 08, 2017 at 01:23:56PM +0100, Rafael J. Wysocki wrote:
> > > On Wed, Nov 8, 2017 at 12:06 AM, Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote:
> > > > On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote:
> > > >> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
> > > >> <ville.syrjala@xxxxxxxxxxxxxxx> wrote:
> > > >> > From: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> > > >> >
> > > >> > acpi_remove_pm_notifier() ends up calling flush_workqueue() while
> > > >> > holding acpi_pm_notifier_lock, and that same lock is taken by
> > > >> > by the work via acpi_pm_notify_handler(). This can deadlock.
> > > >>
> > > >> OK, good catch!
> > > >>
> > > >> [cut]
> > > >>
> > > >> >
> > > >> > Cc: stable@xxxxxxxxxxxxxxx
> > > >> > Cc: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > > >> > Cc: Len Brown <lenb@xxxxxxxxxx>
> > > >> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > > >> > Cc: Tejun Heo <tj@xxxxxxxxxx>
> > > >> > Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> > > >> > Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
> > > >> > Signed-off-by: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> > > >> > ---
> > > >> >  drivers/acpi/device_pm.c | 21 ++++++++++++---------
> > > >> >  1 file changed, 12 insertions(+), 9 deletions(-)
> > > >> >
> > > >> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> > > >> > index fbcc73f7a099..18af71057b44 100644
> > > >> > --- a/drivers/acpi/device_pm.c
> > > >> > +++ b/drivers/acpi/device_pm.c
> > > >> > @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
> > > >> >
> > > >> >  #ifdef CONFIG_PM
> > > >> >  static DEFINE_MUTEX(acpi_pm_notifier_lock);
> > > >> > +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
> > > >> >
> > > >> >  void acpi_pm_wakeup_event(struct device *dev)
> > > >> >  {
> > > >> > @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
> > > >> >         if (!dev && !func)
> > > >> >                 return AE_BAD_PARAMETER;
> > > >> >
> > > >> > -       mutex_lock(&acpi_pm_notifier_lock);
> > > >> > +       mutex_lock(&acpi_pm_notifier_install_lock);
> > > >> >
> > > >> >         if (adev->wakeup.flags.notifier_present)
> > > >> >                 goto out;
> > > >> >
> > > >> > -       adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > > >> > -       adev->wakeup.context.dev = dev;
> > > >> > -       adev->wakeup.context.func = func;
> > > >> > -
> > > >>
> > > >> But this doesn't look good to me.
> > > >>
> > > >> notifier_present should be checked under acpi_pm_notifier_lock.
> > > >>
> > > >> Actually, acpi_install_notify_handler() itself need not be called
> > > >> under the lock, because it does sufficient checks of its own.
> > > >>
> > > >> So say you do
> > > >>
> > > >> mutex_lock(&acpi_pm_notifier_lock);
> > > >>
> > > >> if (adev->wakeup.context.dev)
> > > >>         goto out;
> > > >>
> > > >> adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > > >> adev->wakeup.context.dev = dev;
> > > >> adev->wakeup.context.func = func;
> > > >>
> > > >> mutex_unlock(&acpi_pm_notifier_lock);
> > > >>
> > > >> >         status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
> > > >> >                                              acpi_pm_notify_handler, NULL);
> > > >> >         if (ACPI_FAILURE(status))
> > > >> >                 goto out;
> > > >> >
> > > >> > +       mutex_lock(&acpi_pm_notifier_lock);
> > > >>
> > > >> And here you just set notifier_present under acpi_pm_notifier_lock.
> > > >>
> > > >> > +       adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > > >> > +       adev->wakeup.context.dev = dev;
> > > >> > +       adev->wakeup.context.func = func;
> > > >> >         adev->wakeup.flags.notifier_present = true;
> > > >> > +       mutex_unlock(&acpi_pm_notifier_lock);
> > > >> >
> > > >> >   out:
> > > >> > -       mutex_unlock(&acpi_pm_notifier_lock);
> > > >> > +       mutex_unlock(&acpi_pm_notifier_install_lock);
> > > >> >         return status;
> > > >> >  }
> > > >>
> > > >> Then on removal you can clear notifier_present first and drop the lock
> > > >> around the acpi_remove_notify_handler() call and nothing bad will
> > > >> happen.
> > > >>
> > > >> If you call acpi_add_pm_notifier() twice in parallel, the first
> > > >> instance will set context.dev and the second one will see it set and
> > > >> bail out.  The first instance will then do the rest.
> > > >>
> > > >> If you call acpi_remove_pm_notifier() twice in a row, the first
> > > >> instance will see notifier_present set and will clear it, so the
> > > >> second one will see notifier_present unset and it will bail out.
> > > >>
> > > >> Now, if you call acpi_remove_pm_notifier() in parallel with
> > > >> acpi_add_pm_notifier(), either the former will see notifier_present
> > > >> unset and bail out, or the latter will see context.dev unset and bail
> > > >> out.
> > > >>
> > > >> It doesn't look like the outer lock is needed, or have I missed anything?
> > > >
> > > > So something like the below (totally untested) should work too, shouldn't it?
> > > 
> > > There is a problem if a device is removed while acpi_add_pm_notifier()
> > > is still in progress, in which case with my patch the
> > > acpi_remove_pm_notifier() called from the removal path may bail out
> > > prematurely (that doesn't seem likely to happen, but still I don't see
> > > why it cannot happen), so I'll just use your patch. :-)
> > 
> > OK. I was just looking at your version and was pretty much convinced
> > that it would work. But I'll take your word that it might not :)
> 
> Well, you don't have to. :-)
> 
> The scenario I have in mind is as follows:
> 
> 1. acpi_add_pm_notifier() sets context.dev and context.func and drops the
>    lock.  notifier_present is still unset.
> 
> 2. acpi_remove_pm_notifier() checks notifier_present under the lock.
>    It is (still) unset, so the function decides that there's nothing to do.
> 
> 3. acpi_add_pm_notifier() continues with notifier installation and the
>    device goes away at the same time.

Right. That makes sense. Though I don't know what would prevent racing
add_pm_notifier() against remove_pm_notifier() in a similar way even
with the outer lock. In that case the remove_pm_notifier() would just
have to get at the mutex first and then we'd still end up with the
notifier installed.

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux