I've tested this patch against the fixed userspace part. This patch is real important, I hope it can make it into 4.11... Here's what happens when the userspace daemon is restarted, without the patch: [ 264.138953] ============================================= [ 264.139278] [ INFO: possible recursive locking detected ] [ 264.139607] 4.10.0-00007-ge98bdb3 #1 Not tainted [ 264.139883] --------------------------------------------- [ 264.140205] pvfs2-client-co/29032 is trying to acquire lock: [ 264.140565] (orangefs_request_mutex){+.+.+.}, at: [<ffffffffc060eed7>] service_operation+0x3c7/0x7b0 [orangefs] [ 264.141169] but task is already holding lock: [ 264.141518] (orangefs_request_mutex){+.+.+.}, at: [<ffffffffc060997f>] dispatch_ioctl_command+0x1bf/0x330 [orangefs] [ 264.142145] other info that might help us debug this: [ 264.142535] Possible unsafe locking scenario: [ 264.142888] CPU0 [ 264.143038] ---- [ 264.143187] lock(orangefs_request_mutex); [ 264.143441] lock(orangefs_request_mutex); [ 264.143692] *** DEADLOCK *** [ 264.144043] May be due to missing lock nesting notation [ 264.144447] 1 lock held by pvfs2-client-co/29032: [ 264.144728] #0: (orangefs_request_mutex){+.+.+.}, at: [<ffffffffc060997f>] dispatch_ioctl_command+0x1bf/0x330 [orangefs] [ 264.145380] stack backtrace: [ 264.145645] CPU: 0 PID: 29032 Comm: pvfs2-client-co Not tainted 4.10.0-00007-ge98bdb3 #1 [ 264.146124] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014 [ 264.146639] Call Trace: [ 264.146792] dump_stack+0x86/0xc3 [ 264.146995] __lock_acquire+0x7eb/0x1290 [ 264.147232] ? add_lock_to_list.isra.28.constprop.43+0x8a/0x100 [ 264.147587] lock_acquire+0xe8/0x1d0 [ 264.147805] ? service_operation+0x3c7/0x7b0 [orangefs] [ 264.148119] mutex_lock_killable_nested+0x6f/0x6e0 [ 264.148407] ? service_operation+0x3c7/0x7b0 [orangefs] [ 264.148721] ? service_operation+0x3c7/0x7b0 [orangefs] [ 264.149034] ? debug_lockdep_rcu_enabled+0x1d/0x20 [ 264.149323] service_operation+0x3c7/0x7b0 [orangefs] [ 264.149628] ? remove_wait_queue+0x70/0x70 [ 264.149877] orangefs_remount+0xea/0x150 [orangefs] [ 264.150191] dispatch_ioctl_command+0x227/0x330 [orangefs] [ 264.150523] orangefs_devreq_ioctl+0x29/0x70 [orangefs] [ 264.150837] do_vfs_ioctl+0xa3/0x6e0 [ 264.151054] ? __fget+0x5/0x1f0 [ 264.151246] SyS_ioctl+0x79/0x90 [ 264.151445] entry_SYSCALL_64_fastpath+0x1f/0xc2 [ 264.151722] RIP: 0033:0x7f8ac4279ce7 [ 264.151939] RSP: 002b:00007f8ac0338e38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 264.152389] RAX: ffffffffffffffda RBX: 00007f8ac0339700 RCX: 00007f8ac4279ce7 [ 264.152814] RDX: 0000000000000000 RSI: 0000000000006b05 RDI: 0000000000000005 [ 264.153239] RBP: 00007ffd6b889a10 R08: 0000000000611c00 R09: 00007f8ac0339700 [ 264.153664] R10: 00000000000000e8 R11: 0000000000000246 R12: 00007ffd6b889a0f [ 264.154089] R13: 0000000000001000 R14: 00007f8ac0339700 R15: 00007f8ac03399c0 Signed-off-by: Mike Marshall <hubcap@xxxxxxxxxxxx> On Thu, Apr 6, 2017 at 6:11 PM, Martin Brandenburg <martin@xxxxxxxxxxxx> wrote: > Without this fix (and another to the userspace component itself > described later), the kernel will be unable to process any OrangeFS > requests after the userspace component is restarted (due to a crash or > at the administrator's behest). > > The bug here is that inside orangefs_remount, the orangefs_request_mutex > is locked. When the userspace component restarts while the filesystem > is mounted, it sends a ORANGEFS_DEV_REMOUNT_ALL ioctl to the device, > which causes the kernel to send it a few requests aimed at synchronizing > the state between the two. While this is happening the > orangefs_request_mutex is locked to prevent any other requests going > through. > > This is only half of the bugfix. The other half is in the userspace > component which outright ignores(!) requests made before it considers > the filesystem remounted, which is after the ioctl returns. Of course > the ioctl doesn't return until after the userspace component responds to > the request it ignores. The userspace component has been changed to > allow ORANGEFS_VFS_OP_FEATURES regardless of the mount status. > > Signed-off-by: Martin Brandenburg <martin@xxxxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > --- > fs/orangefs/super.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c > index 67c2435..cd261c8 100644 > --- a/fs/orangefs/super.c > +++ b/fs/orangefs/super.c > @@ -263,8 +263,13 @@ int orangefs_remount(struct orangefs_sb_info_s *orangefs_sb) > if (!new_op) > return -ENOMEM; > new_op->upcall.req.features.features = 0; > - ret = service_operation(new_op, "orangefs_features", 0); > - orangefs_features = new_op->downcall.resp.features.features; > + ret = service_operation(new_op, "orangefs_features", > + ORANGEFS_OP_PRIORITY | ORANGEFS_OP_NO_MUTEX); > + if (!ret) > + orangefs_features = > + new_op->downcall.resp.features.features; > + else > + orangefs_features = 0; > op_release(new_op); > } else { > orangefs_features = 0; > -- > 2.1.4 >