Thoughts about whether cc: stable? On Wed, May 3, 2017 at 10:54 AM, Rabin Vincent <rabin.vincent@xxxxxxxx> wrote: > From: Rabin Vincent <rabinv@xxxxxxxx> > > When the final cifsFileInfo_put() is called from cifsiod and an oplock > break work is queued, lockdep complains loudly: > > ============================================= > [ INFO: possible recursive locking detected ] > 4.11.0+ #21 Not tainted > --------------------------------------------- > kworker/0:2/78 is trying to acquire lock: > ("cifsiod"){++++.+}, at: flush_work+0x215/0x350 > > but task is already holding lock: > ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock("cifsiod"); > lock("cifsiod"); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 2 locks held by kworker/0:2/78: > #0: ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0 > #1: ((&wdata->work)){+.+...}, at: process_one_work+0x255/0x8e0 > > stack backtrace: > CPU: 0 PID: 78 Comm: kworker/0:2 Not tainted 4.11.0+ #21 > Workqueue: cifsiod cifs_writev_complete > Call Trace: > dump_stack+0x85/0xc2 > __lock_acquire+0x17dd/0x2260 > ? match_held_lock+0x20/0x2b0 > ? trace_hardirqs_off_caller+0x86/0x130 > ? mark_lock+0xa6/0x920 > lock_acquire+0xcc/0x260 > ? lock_acquire+0xcc/0x260 > ? flush_work+0x215/0x350 > flush_work+0x236/0x350 > ? flush_work+0x215/0x350 > ? destroy_worker+0x170/0x170 > __cancel_work_timer+0x17d/0x210 > ? ___preempt_schedule+0x16/0x18 > cancel_work_sync+0x10/0x20 > cifsFileInfo_put+0x338/0x7f0 > cifs_writedata_release+0x2a/0x40 > ? cifs_writedata_release+0x2a/0x40 > cifs_writev_complete+0x29d/0x850 > ? preempt_count_sub+0x18/0xd0 > process_one_work+0x304/0x8e0 > worker_thread+0x9b/0x6a0 > kthread+0x1b2/0x200 > ? process_one_work+0x8e0/0x8e0 > ? kthread_create_on_node+0x40/0x40 > ret_from_fork+0x31/0x40 > > This is a real warning. Since the oplock is queued on the same > workqueue this can deadlock if there is only one worker thread active > for the workqueue (which will be the case during memory pressure when > the rescuer thread is handling it). > > Furthermore, there is at least one other kind of hang possible due to > the oplock break handling if there is only worker. (This can be > reproduced without introducing memory pressure by having passing 1 for > the max_active parameter of cifsiod.) cifs_oplock_break() can wait > indefintely in the filemap_fdatawait() while the cifs_writev_complete() > work is blocked: > > sysrq: SysRq : Show Blocked State > task PC stack pid father > kworker/0:1 D 0 16 2 0x00000000 > Workqueue: cifsiod cifs_oplock_break > Call Trace: > __schedule+0x562/0xf40 > ? mark_held_locks+0x4a/0xb0 > schedule+0x57/0xe0 > io_schedule+0x21/0x50 > wait_on_page_bit+0x143/0x190 > ? add_to_page_cache_lru+0x150/0x150 > __filemap_fdatawait_range+0x134/0x190 > ? do_writepages+0x51/0x70 > filemap_fdatawait_range+0x14/0x30 > filemap_fdatawait+0x3b/0x40 > cifs_oplock_break+0x651/0x710 > ? preempt_count_sub+0x18/0xd0 > process_one_work+0x304/0x8e0 > worker_thread+0x9b/0x6a0 > kthread+0x1b2/0x200 > ? process_one_work+0x8e0/0x8e0 > ? kthread_create_on_node+0x40/0x40 > ret_from_fork+0x31/0x40 > dd D 0 683 171 0x00000000 > Call Trace: > __schedule+0x562/0xf40 > ? mark_held_locks+0x29/0xb0 > schedule+0x57/0xe0 > io_schedule+0x21/0x50 > wait_on_page_bit+0x143/0x190 > ? add_to_page_cache_lru+0x150/0x150 > __filemap_fdatawait_range+0x134/0x190 > ? do_writepages+0x51/0x70 > filemap_fdatawait_range+0x14/0x30 > filemap_fdatawait+0x3b/0x40 > filemap_write_and_wait+0x4e/0x70 > cifs_flush+0x6a/0xb0 > filp_close+0x52/0xa0 > __close_fd+0xdc/0x150 > SyS_close+0x33/0x60 > entry_SYSCALL_64_fastpath+0x1f/0xbe > > Showing all locks held in the system: > 2 locks held by kworker/0:1/16: > #0: ("cifsiod"){.+.+.+}, at: process_one_work+0x255/0x8e0 > #1: ((&cfile->oplock_break)){+.+.+.}, at: process_one_work+0x255/0x8e0 > > Showing busy workqueues and worker pools: > workqueue cifsiod: flags=0xc > pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1 > in-flight: 16:cifs_oplock_break > delayed: cifs_writev_complete, cifs_echo_request > pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 750 3 > > Fix these problems by creating a a new workqueue (with a rescuer) for > the oplock break work. > > Signed-off-by: Rabin Vincent <rabinv@xxxxxxxx> > --- > fs/cifs/cifsfs.c | 15 +++++++++++++-- > fs/cifs/cifsglob.h | 1 + > fs/cifs/misc.c | 2 +- > fs/cifs/smb2misc.c | 5 +++-- > 4 files changed, 18 insertions(+), 5 deletions(-) > > diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c > index 34fee9f..5c52c8f 100644 > --- a/fs/cifs/cifsfs.c > +++ b/fs/cifs/cifsfs.c > @@ -87,6 +87,7 @@ extern mempool_t *cifs_req_poolp; > extern mempool_t *cifs_mid_poolp; > > struct workqueue_struct *cifsiod_wq; > +struct workqueue_struct *cifsoplockd_wq; > __u32 cifs_lock_secret; > > /* > @@ -1374,9 +1375,16 @@ init_cifs(void) > goto out_clean_proc; > } > > + cifsoplockd_wq = alloc_workqueue("cifsoplockd", > + WQ_FREEZABLE|WQ_MEM_RECLAIM, 0); > + if (!cifsoplockd_wq) { > + rc = -ENOMEM; > + goto out_destroy_cifsiod_wq; > + } > + > rc = cifs_fscache_register(); > if (rc) > - goto out_destroy_wq; > + goto out_destroy_cifsoplockd_wq; > > rc = cifs_init_inodecache(); > if (rc) > @@ -1424,7 +1432,9 @@ init_cifs(void) > cifs_destroy_inodecache(); > out_unreg_fscache: > cifs_fscache_unregister(); > -out_destroy_wq: > +out_destroy_cifsoplockd_wq: > + destroy_workqueue(cifsoplockd_wq); > +out_destroy_cifsiod_wq: > destroy_workqueue(cifsiod_wq); > out_clean_proc: > cifs_proc_clean(); > @@ -1447,6 +1457,7 @@ exit_cifs(void) > cifs_destroy_mids(); > cifs_destroy_inodecache(); > cifs_fscache_unregister(); > + destroy_workqueue(cifsoplockd_wq); > destroy_workqueue(cifsiod_wq); > cifs_proc_clean(); > } > diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h > index 37f5a41..17f0e73 100644 > --- a/fs/cifs/cifsglob.h > +++ b/fs/cifs/cifsglob.h > @@ -1683,6 +1683,7 @@ void cifs_oplock_break(struct work_struct *work); > > extern const struct slow_work_ops cifs_oplock_break_ops; > extern struct workqueue_struct *cifsiod_wq; > +extern struct workqueue_struct *cifsoplockd_wq; > extern __u32 cifs_lock_secret; > > extern mempool_t *cifs_mid_poolp; > diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c > index d3fb115..b578c6d 100644 > --- a/fs/cifs/misc.c > +++ b/fs/cifs/misc.c > @@ -492,7 +492,7 @@ is_valid_oplock_break(char *buffer, struct TCP_Server_Info *srv) > CIFS_INODE_DOWNGRADE_OPLOCK_TO_L2, > &pCifsInode->flags); > > - queue_work(cifsiod_wq, > + queue_work(cifsoplockd_wq, > &netfile->oplock_break); > netfile->oplock_break_cancelled = false; > > diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c > index 1a04b3a..7b08a14 100644 > --- a/fs/cifs/smb2misc.c > +++ b/fs/cifs/smb2misc.c > @@ -499,7 +499,7 @@ smb2_tcon_has_lease(struct cifs_tcon *tcon, struct smb2_lease_break *rsp, > else > cfile->oplock_break_cancelled = true; > > - queue_work(cifsiod_wq, &cfile->oplock_break); > + queue_work(cifsoplockd_wq, &cfile->oplock_break); > kfree(lw); > return true; > } > @@ -643,7 +643,8 @@ smb2_is_valid_oplock_break(char *buffer, struct TCP_Server_Info *server) > CIFS_INODE_DOWNGRADE_OPLOCK_TO_L2, > &cinode->flags); > spin_unlock(&cfile->file_info_lock); > - queue_work(cifsiod_wq, &cfile->oplock_break); > + queue_work(cifsoplockd_wq, > + &cfile->oplock_break); > > spin_unlock(&tcon->open_file_lock); > spin_unlock(&cifs_tcp_ses_lock); > -- > 2.1.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thanks, Steve -- To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html