On Wed, Nov 8, 2017 at 12:06 AM, Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote: > On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote: >> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala >> <ville.syrjala@xxxxxxxxxxxxxxx> wrote: >> > From: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> >> > >> > acpi_remove_pm_notifier() ends up calling flush_workqueue() while >> > holding acpi_pm_notifier_lock, and that same lock is taken by >> > by the work via acpi_pm_notify_handler(). This can deadlock. >> >> OK, good catch! >> >> [cut] >> >> > >> > Cc: stable@xxxxxxxxxxxxxxx >> > Cc: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> >> > Cc: Len Brown <lenb@xxxxxxxxxx> >> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> >> > Cc: Tejun Heo <tj@xxxxxxxxxx> >> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> >> > Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications") >> > Signed-off-by: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> >> > --- >> > drivers/acpi/device_pm.c | 21 ++++++++++++--------- >> > 1 file changed, 12 insertions(+), 9 deletions(-) >> > >> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c >> > index fbcc73f7a099..18af71057b44 100644 >> > --- a/drivers/acpi/device_pm.c >> > +++ b/drivers/acpi/device_pm.c >> > @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable); >> > >> > #ifdef CONFIG_PM >> > static DEFINE_MUTEX(acpi_pm_notifier_lock); >> > +static DEFINE_MUTEX(acpi_pm_notifier_install_lock); >> > >> > void acpi_pm_wakeup_event(struct device *dev) >> > { >> > @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev, >> > if (!dev && !func) >> > return AE_BAD_PARAMETER; >> > >> > - mutex_lock(&acpi_pm_notifier_lock); >> > + mutex_lock(&acpi_pm_notifier_install_lock); >> > >> > if (adev->wakeup.flags.notifier_present) >> > goto out; >> > >> > - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev)); >> > - adev->wakeup.context.dev = dev; >> > - adev->wakeup.context.func = func; >> > - >> >> But this doesn't look good to me. >> >> notifier_present should be checked under acpi_pm_notifier_lock. >> >> Actually, acpi_install_notify_handler() itself need not be called >> under the lock, because it does sufficient checks of its own. >> >> So say you do >> >> mutex_lock(&acpi_pm_notifier_lock); >> >> if (adev->wakeup.context.dev) >> goto out; >> >> adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev)); >> adev->wakeup.context.dev = dev; >> adev->wakeup.context.func = func; >> >> mutex_unlock(&acpi_pm_notifier_lock); >> >> > status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY, >> > acpi_pm_notify_handler, NULL); >> > if (ACPI_FAILURE(status)) >> > goto out; >> > >> > + mutex_lock(&acpi_pm_notifier_lock); >> >> And here you just set notifier_present under acpi_pm_notifier_lock. >> >> > + adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev)); >> > + adev->wakeup.context.dev = dev; >> > + adev->wakeup.context.func = func; >> > adev->wakeup.flags.notifier_present = true; >> > + mutex_unlock(&acpi_pm_notifier_lock); >> > >> > out: >> > - mutex_unlock(&acpi_pm_notifier_lock); >> > + mutex_unlock(&acpi_pm_notifier_install_lock); >> > return status; >> > } >> >> Then on removal you can clear notifier_present first and drop the lock >> around the acpi_remove_notify_handler() call and nothing bad will >> happen. >> >> If you call acpi_add_pm_notifier() twice in parallel, the first >> instance will set context.dev and the second one will see it set and >> bail out. The first instance will then do the rest. >> >> If you call acpi_remove_pm_notifier() twice in a row, the first >> instance will see notifier_present set and will clear it, so the >> second one will see notifier_present unset and it will bail out. >> >> Now, if you call acpi_remove_pm_notifier() in parallel with >> acpi_add_pm_notifier(), either the former will see notifier_present >> unset and bail out, or the latter will see context.dev unset and bail >> out. >> >> It doesn't look like the outer lock is needed, or have I missed anything? > > So something like the below (totally untested) should work too, shouldn't it? There is a problem if a device is removed while acpi_add_pm_notifier() is still in progress, in which case with my patch the acpi_remove_pm_notifier() called from the removal path may bail out prematurely (that doesn't seem likely to happen, but still I don't see why it cannot happen), so I'll just use your patch. :-) Thanks, Rafael