Re: [PATCH v2 6/7] libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 17, 2019 at 06:08:21PM -0700, Dan Williams wrote:
A multithreaded namespace creation/destruction stress test currently
deadlocks with the following lockup signature:

   INFO: task ndctl:2924 blocked for more than 122 seconds.
         Tainted: G           OE     5.2.0-rc4+ #3382
   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
   ndctl           D    0  2924   1176 0x00000000
   Call Trace:
    ? __schedule+0x27e/0x780
    schedule+0x30/0xb0
    wait_nvdimm_bus_probe_idle+0x8a/0xd0 [libnvdimm]
    ? finish_wait+0x80/0x80
    uuid_store+0xe6/0x2e0 [libnvdimm]
    kernfs_fop_write+0xf0/0x1a0
    vfs_write+0xb7/0x1b0
    ksys_write+0x5c/0xd0
    do_syscall_64+0x60/0x240

    INFO: task ndctl:2923 blocked for more than 122 seconds.
          Tainted: G           OE     5.2.0-rc4+ #3382
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    ndctl           D    0  2923   1175 0x00000000
    Call Trace:
     ? __schedule+0x27e/0x780
     ? __mutex_lock+0x489/0x910
     schedule+0x30/0xb0
     schedule_preempt_disabled+0x11/0x20
     __mutex_lock+0x48e/0x910
     ? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
     ? __lock_acquire+0x23f/0x1710
     ? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
     nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
     __dax_pmem_probe+0x5e/0x210 [dax_pmem_core]
     ? nvdimm_bus_probe+0x1d0/0x2c0 [libnvdimm]
     dax_pmem_probe+0xc/0x20 [dax_pmem]
     nvdimm_bus_probe+0x90/0x2c0 [libnvdimm]
     really_probe+0xef/0x390
     driver_probe_device+0xb4/0x100

In this sequence an 'nd_dax' device is being probed and trying to take
the lock on its backing namespace to validate that the 'nd_dax' device
indeed has exclusive access to the backing namespace. Meanwhile, another
thread is trying to update the uuid property of that same backing
namespace. So one thread is in the probe path trying to acquire the
lock, and the other thread has acquired the lock and tries to flush the
probe path.

Fix this deadlock by not holding the namespace device_lock over the
wait_nvdimm_bus_probe_idle() synchronization step. In turn this requires
the device_lock to be held on entry to wait_nvdimm_bus_probe_idle() and
subsequently dropped internally to wait_nvdimm_bus_probe_idle().

Cc: <stable@xxxxxxxxxxxxxxx>
Fixes: bf9bccc14c05 ("libnvdimm: pmem label sets and namespace instantiation")
Cc: Vishal Verma <vishal.l.verma@xxxxxxxxx>
Tested-by: Jane Chu <jane.chu@xxxxxxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>

Hi Dan,

The way these patches are split, when we take them to stable this patch
won't apply because it wants "libnvdimm/bus: Prepare the nd_ioctl() path
to be re-entrant".

If you were to send another iteration of this patchset, could you please
re-order the patches so they will apply cleanly to stable? this will
help with reducing mail exchanges later on and possibly a mis-merge into
stable.

If not, this should serve as a reference for future us to double check
the backport.

--
Thanks,
Sasha



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux