Patch "ACPI: NFIT: fix a potential deadlock during NFIT teardown" has been added to the 5.15-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Sun, 26 Feb 2023 10:47:41 -0500

This is a note to let you know that I've just added the patch titled

    ACPI: NFIT: fix a potential deadlock during NFIT teardown

to the 5.15-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     acpi-nfit-fix-a-potential-deadlock-during-nfit-teard.patch
and it can be found in the queue-5.15 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit c1c18fcff3ea738260900d2299829e0596933f68
Author: Vishal Verma <vishal.l.verma@xxxxxxxxx>
Date:   Wed Jan 25 11:34:18 2023 -0700

    ACPI: NFIT: fix a potential deadlock during NFIT teardown
    
    [ Upstream commit fb6df4366f86dd252bfa3049edffa52d17e7b895 ]
    
    Lockdep reports that acpi_nfit_shutdown() may deadlock against an
    opportune acpi_nfit_scrub(). acpi_nfit_scrub () is run from inside a
    'work' and therefore has already acquired workqueue-internal locks. It
    also acquiires acpi_desc->init_mutex. acpi_nfit_shutdown() first
    acquires init_mutex, and was subsequently attempting to cancel any
    pending workqueue items. This reversed locking order causes a potential
    deadlock:
    
        ======================================================
        WARNING: possible circular locking dependency detected
        6.2.0-rc3 #116 Tainted: G           O     N
        ------------------------------------------------------
        libndctl/1958 is trying to acquire lock:
        ffff888129b461c0 ((work_completion)(&(&acpi_desc->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0x43/0x450
    
        but task is already holding lock:
        ffff888129b460e8 (&acpi_desc->init_mutex){+.+.}-{3:3}, at: acpi_nfit_shutdown+0x87/0xd0 [nfit]
    
        which lock already depends on the new lock.
    
        ...
    
        Possible unsafe locking scenario:
    
              CPU0                    CPU1
              ----                    ----
         lock(&acpi_desc->init_mutex);
                                      lock((work_completion)(&(&acpi_desc->dwork)->work));
                                      lock(&acpi_desc->init_mutex);
         lock((work_completion)(&(&acpi_desc->dwork)->work));
    
        *** DEADLOCK ***
    
    Since the workqueue manipulation is protected by its own internal locking,
    the cancellation of pending work doesn't need to be done under
    acpi_desc->init_mutex. Move cancel_delayed_work_sync() outside the
    init_mutex to fix the deadlock. Any work that starts after
    acpi_nfit_shutdown() drops the lock will see ARS_CANCEL, and the
    cancel_delayed_work_sync() will safely flush it out.
    
    Reported-by: Dan Williams <dan.j.williams@xxxxxxxxx>
    Signed-off-by: Vishal Verma <vishal.l.verma@xxxxxxxxx>
    Link: https://lore.kernel.org/r/20230112-acpi_nfit_lockdep-v1-1-660be4dd10be@xxxxxxxxx
    Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 7dd80acf92c78..2575d6c51f898 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -3676,8 +3676,8 @@ void acpi_nfit_shutdown(void *data)
 
 	mutex_lock(&acpi_desc->init_mutex);
 	set_bit(ARS_CANCEL, &acpi_desc->scrub_flags);
-	cancel_delayed_work_sync(&acpi_desc->dwork);
 	mutex_unlock(&acpi_desc->init_mutex);
+	cancel_delayed_work_sync(&acpi_desc->dwork);
 
 	/*
 	 * Bounce the nvdimm bus lock to make sure any in-flight