Re: [PATCH] CIFS: fix oplock break deadlocks

Steve French <smfrench@xxxxxxxxx> · Wed, 3 May 2017 11:18:11 -0500



merged into cifs-2.6.git for-next

cc: stabled

On Wed, May 3, 2017 at 11:04 AM, Steve French <smfrench@xxxxxxxxx> wrote:
> Thoughts about whether cc: stable?
>
> On Wed, May 3, 2017 at 10:54 AM, Rabin Vincent <rabin.vincent@xxxxxxxx> wrote:
>> From: Rabin Vincent <rabinv@xxxxxxxx>
>>
>> When the final cifsFileInfo_put() is called from cifsiod and an oplock
>> break work is queued, lockdep complains loudly:
>>
>>  =============================================
>>  [ INFO: possible recursive locking detected ]
>>  4.11.0+ #21 Not tainted
>>  ---------------------------------------------
>>  kworker/0:2/78 is trying to acquire lock:
>>   ("cifsiod"){++++.+}, at: flush_work+0x215/0x350
>>
>>  but task is already holding lock:
>>   ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0
>>
>>  other info that might help us debug this:
>>   Possible unsafe locking scenario:
>>
>>         CPU0
>>         ----
>>    lock("cifsiod");
>>    lock("cifsiod");
>>
>>   *** DEADLOCK ***
>>
>>   May be due to missing lock nesting notation
>>
>>  2 locks held by kworker/0:2/78:
>>   #0:  ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0
>>   #1:  ((&wdata->work)){+.+...}, at: process_one_work+0x255/0x8e0
>>
>>  stack backtrace:
>>  CPU: 0 PID: 78 Comm: kworker/0:2 Not tainted 4.11.0+ #21
>>  Workqueue: cifsiod cifs_writev_complete
>>  Call Trace:
>>   dump_stack+0x85/0xc2
>>   __lock_acquire+0x17dd/0x2260
>>   ? match_held_lock+0x20/0x2b0
>>   ? trace_hardirqs_off_caller+0x86/0x130
>>   ? mark_lock+0xa6/0x920
>>   lock_acquire+0xcc/0x260
>>   ? lock_acquire+0xcc/0x260
>>   ? flush_work+0x215/0x350
>>   flush_work+0x236/0x350
>>   ? flush_work+0x215/0x350
>>   ? destroy_worker+0x170/0x170
>>   __cancel_work_timer+0x17d/0x210
>>   ? ___preempt_schedule+0x16/0x18
>>   cancel_work_sync+0x10/0x20
>>   cifsFileInfo_put+0x338/0x7f0
>>   cifs_writedata_release+0x2a/0x40
>>   ? cifs_writedata_release+0x2a/0x40
>>   cifs_writev_complete+0x29d/0x850
>>   ? preempt_count_sub+0x18/0xd0
>>   process_one_work+0x304/0x8e0
>>   worker_thread+0x9b/0x6a0
>>   kthread+0x1b2/0x200
>>   ? process_one_work+0x8e0/0x8e0
>>   ? kthread_create_on_node+0x40/0x40
>>   ret_from_fork+0x31/0x40
>>
>> This is a real warning.  Since the oplock is queued on the same
>> workqueue this can deadlock if there is only one worker thread active
>> for the workqueue (which will be the case during memory pressure when
>> the rescuer thread is handling it).
>>
>> Furthermore, there is at least one other kind of hang possible due to
>> the oplock break handling if there is only worker.  (This can be
>> reproduced without introducing memory pressure by having passing 1 for
>> the max_active parameter of cifsiod.) cifs_oplock_break() can wait
>> indefintely in the filemap_fdatawait() while the cifs_writev_complete()
>> work is blocked:
>>
>>  sysrq: SysRq : Show Blocked State
>>    task                        PC stack   pid father
>>  kworker/0:1     D    0    16      2 0x00000000
>>  Workqueue: cifsiod cifs_oplock_break
>>  Call Trace:
>>   __schedule+0x562/0xf40
>>   ? mark_held_locks+0x4a/0xb0
>>   schedule+0x57/0xe0
>>   io_schedule+0x21/0x50
>>   wait_on_page_bit+0x143/0x190
>>   ? add_to_page_cache_lru+0x150/0x150
>>   __filemap_fdatawait_range+0x134/0x190
>>   ? do_writepages+0x51/0x70
>>   filemap_fdatawait_range+0x14/0x30
>>   filemap_fdatawait+0x3b/0x40
>>   cifs_oplock_break+0x651/0x710
>>   ? preempt_count_sub+0x18/0xd0
>>   process_one_work+0x304/0x8e0
>>   worker_thread+0x9b/0x6a0
>>   kthread+0x1b2/0x200
>>   ? process_one_work+0x8e0/0x8e0
>>   ? kthread_create_on_node+0x40/0x40
>>   ret_from_fork+0x31/0x40
>>  dd              D    0   683    171 0x00000000
>>  Call Trace:
>>   __schedule+0x562/0xf40
>>   ? mark_held_locks+0x29/0xb0
>>   schedule+0x57/0xe0
>>   io_schedule+0x21/0x50
>>   wait_on_page_bit+0x143/0x190
>>   ? add_to_page_cache_lru+0x150/0x150
>>   __filemap_fdatawait_range+0x134/0x190
>>   ? do_writepages+0x51/0x70
>>   filemap_fdatawait_range+0x14/0x30
>>   filemap_fdatawait+0x3b/0x40
>>   filemap_write_and_wait+0x4e/0x70
>>   cifs_flush+0x6a/0xb0
>>   filp_close+0x52/0xa0
>>   __close_fd+0xdc/0x150
>>   SyS_close+0x33/0x60
>>   entry_SYSCALL_64_fastpath+0x1f/0xbe
>>
>>  Showing all locks held in the system:
>>  2 locks held by kworker/0:1/16:
>>   #0:  ("cifsiod"){.+.+.+}, at: process_one_work+0x255/0x8e0
>>   #1:  ((&cfile->oplock_break)){+.+.+.}, at: process_one_work+0x255/0x8e0
>>
>>  Showing busy workqueues and worker pools:
>>  workqueue cifsiod: flags=0xc
>>    pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
>>      in-flight: 16:cifs_oplock_break
>>      delayed: cifs_writev_complete, cifs_echo_request
>>  pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 750 3
>>
>> Fix these problems by creating a a new workqueue (with a rescuer) for
>> the oplock break work.
>>
>> Signed-off-by: Rabin Vincent <rabinv@xxxxxxxx>
>> ---
>>  fs/cifs/cifsfs.c   | 15 +++++++++++++--
>>  fs/cifs/cifsglob.h |  1 +
>>  fs/cifs/misc.c     |  2 +-
>>  fs/cifs/smb2misc.c |  5 +++--
>>  4 files changed, 18 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
>> index 34fee9f..5c52c8f 100644
>> --- a/fs/cifs/cifsfs.c
>> +++ b/fs/cifs/cifsfs.c
>> @@ -87,6 +87,7 @@ extern mempool_t *cifs_req_poolp;
>>  extern mempool_t *cifs_mid_poolp;
>>
>>  struct workqueue_struct        *cifsiod_wq;
>> +struct workqueue_struct        *cifsoplockd_wq;
>>  __u32 cifs_lock_secret;
>>
>>  /*
>> @@ -1374,9 +1375,16 @@ init_cifs(void)
>>                 goto out_clean_proc;
>>         }
>>
>> +       cifsoplockd_wq = alloc_workqueue("cifsoplockd",
>> +                                        WQ_FREEZABLE|WQ_MEM_RECLAIM, 0);
>> +       if (!cifsoplockd_wq) {
>> +               rc = -ENOMEM;
>> +               goto out_destroy_cifsiod_wq;
>> +       }
>> +
>>         rc = cifs_fscache_register();
>>         if (rc)
>> -               goto out_destroy_wq;
>> +               goto out_destroy_cifsoplockd_wq;
>>
>>         rc = cifs_init_inodecache();
>>         if (rc)
>> @@ -1424,7 +1432,9 @@ init_cifs(void)
>>         cifs_destroy_inodecache();
>>  out_unreg_fscache:
>>         cifs_fscache_unregister();
>> -out_destroy_wq:
>> +out_destroy_cifsoplockd_wq:
>> +       destroy_workqueue(cifsoplockd_wq);
>> +out_destroy_cifsiod_wq:
>>         destroy_workqueue(cifsiod_wq);
>>  out_clean_proc:
>>         cifs_proc_clean();
>> @@ -1447,6 +1457,7 @@ exit_cifs(void)
>>         cifs_destroy_mids();
>>         cifs_destroy_inodecache();
>>         cifs_fscache_unregister();
>> +       destroy_workqueue(cifsoplockd_wq);
>>         destroy_workqueue(cifsiod_wq);
>>         cifs_proc_clean();
>>  }
>> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
>> index 37f5a41..17f0e73 100644
>> --- a/fs/cifs/cifsglob.h
>> +++ b/fs/cifs/cifsglob.h
>> @@ -1683,6 +1683,7 @@ void cifs_oplock_break(struct work_struct *work);
>>
>>  extern const struct slow_work_ops cifs_oplock_break_ops;
>>  extern struct workqueue_struct *cifsiod_wq;
>> +extern struct workqueue_struct *cifsoplockd_wq;
>>  extern __u32 cifs_lock_secret;
>>
>>  extern mempool_t *cifs_mid_poolp;
>> diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c
>> index d3fb115..b578c6d 100644
>> --- a/fs/cifs/misc.c
>> +++ b/fs/cifs/misc.c
>> @@ -492,7 +492,7 @@ is_valid_oplock_break(char *buffer, struct TCP_Server_Info *srv)
>>                                            CIFS_INODE_DOWNGRADE_OPLOCK_TO_L2,
>>                                            &pCifsInode->flags);
>>
>> -                               queue_work(cifsiod_wq,
>> +                               queue_work(cifsoplockd_wq,
>>                                            &netfile->oplock_break);
>>                                 netfile->oplock_break_cancelled = false;
>>
>> diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c
>> index 1a04b3a..7b08a14 100644
>> --- a/fs/cifs/smb2misc.c
>> +++ b/fs/cifs/smb2misc.c
>> @@ -499,7 +499,7 @@ smb2_tcon_has_lease(struct cifs_tcon *tcon, struct smb2_lease_break *rsp,
>>                 else
>>                         cfile->oplock_break_cancelled = true;
>>
>> -               queue_work(cifsiod_wq, &cfile->oplock_break);
>> +               queue_work(cifsoplockd_wq, &cfile->oplock_break);
>>                 kfree(lw);
>>                 return true;
>>         }
>> @@ -643,7 +643,8 @@ smb2_is_valid_oplock_break(char *buffer, struct TCP_Server_Info *server)
>>                                            CIFS_INODE_DOWNGRADE_OPLOCK_TO_L2,
>>                                            &cinode->flags);
>>                                 spin_unlock(&cfile->file_info_lock);
>> -                               queue_work(cifsiod_wq, &cfile->oplock_break);
>> +                               queue_work(cifsoplockd_wq,
>> +                                          &cfile->oplock_break);
>>
>>                                 spin_unlock(&tcon->open_file_lock);
>>                                 spin_unlock(&cifs_tcp_ses_lock);
>> --
>> 2.1.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Thanks,
>
> Steve


-- 
Thanks,

Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html