On Wed, Jul 17, 2019 at 7:05 PM Sasha Levin <sashal@xxxxxxxxxx> wrote: > > On Wed, Jul 17, 2019 at 06:08:21PM -0700, Dan Williams wrote: > >A multithreaded namespace creation/destruction stress test currently > >deadlocks with the following lockup signature: > > > > INFO: task ndctl:2924 blocked for more than 122 seconds. > > Tainted: G OE 5.2.0-rc4+ #3382 > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > ndctl D 0 2924 1176 0x00000000 > > Call Trace: > > ? __schedule+0x27e/0x780 > > schedule+0x30/0xb0 > > wait_nvdimm_bus_probe_idle+0x8a/0xd0 [libnvdimm] > > ? finish_wait+0x80/0x80 > > uuid_store+0xe6/0x2e0 [libnvdimm] > > kernfs_fop_write+0xf0/0x1a0 > > vfs_write+0xb7/0x1b0 > > ksys_write+0x5c/0xd0 > > do_syscall_64+0x60/0x240 > > > > INFO: task ndctl:2923 blocked for more than 122 seconds. > > Tainted: G OE 5.2.0-rc4+ #3382 > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > ndctl D 0 2923 1175 0x00000000 > > Call Trace: > > ? __schedule+0x27e/0x780 > > ? __mutex_lock+0x489/0x910 > > schedule+0x30/0xb0 > > schedule_preempt_disabled+0x11/0x20 > > __mutex_lock+0x48e/0x910 > > ? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm] > > ? __lock_acquire+0x23f/0x1710 > > ? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm] > > nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm] > > __dax_pmem_probe+0x5e/0x210 [dax_pmem_core] > > ? nvdimm_bus_probe+0x1d0/0x2c0 [libnvdimm] > > dax_pmem_probe+0xc/0x20 [dax_pmem] > > nvdimm_bus_probe+0x90/0x2c0 [libnvdimm] > > really_probe+0xef/0x390 > > driver_probe_device+0xb4/0x100 > > > >In this sequence an 'nd_dax' device is being probed and trying to take > >the lock on its backing namespace to validate that the 'nd_dax' device > >indeed has exclusive access to the backing namespace. Meanwhile, another > >thread is trying to update the uuid property of that same backing > >namespace. So one thread is in the probe path trying to acquire the > >lock, and the other thread has acquired the lock and tries to flush the > >probe path. > > > >Fix this deadlock by not holding the namespace device_lock over the > >wait_nvdimm_bus_probe_idle() synchronization step. In turn this requires > >the device_lock to be held on entry to wait_nvdimm_bus_probe_idle() and > >subsequently dropped internally to wait_nvdimm_bus_probe_idle(). > > > >Cc: <stable@xxxxxxxxxxxxxxx> > >Fixes: bf9bccc14c05 ("libnvdimm: pmem label sets and namespace instantiation") > >Cc: Vishal Verma <vishal.l.verma@xxxxxxxxx> > >Tested-by: Jane Chu <jane.chu@xxxxxxxxxx> > >Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > > Hi Dan, > > The way these patches are split, when we take them to stable this patch > won't apply because it wants "libnvdimm/bus: Prepare the nd_ioctl() path > to be re-entrant". > > If you were to send another iteration of this patchset, could you please > re-order the patches so they will apply cleanly to stable? this will > help with reducing mail exchanges later on and possibly a mis-merge into > stable. > > If not, this should serve as a reference for future us to double check > the backport. Oh we should backport all of them. I'll tag that one for -stable as well. It's a hard pre-requisite for the fix.