Just a thought, although leasing code does not cause such deadlocks with rwsems, perhaps leasing and oplock code can reside on one workqueue! On Fri, Mar 21, 2014 at 10:07 AM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > We just had a customer report a deadlock in their 3.8-rt kernel. > Looking into this, it is very possible to have the same deadlock in > mainline in Linus's tree as it stands today. > > It is much easier to deadlock in the -rt kernel because reader locks > are serialized, where a down_read() can block another down_read(). But > because rwsems are fair locks, if a writer is waiting, a new reader > will then block. This means that if it is possible for a reader to > deadlock another reader, this can happen if a write comes along and > blocks on a current reader. That will prevent another reader from > running, and if that new reader requires to wake up a reader that owns > the lock, you have your deadlock. > > Here's the situation with CIFS and workqueues: > > The cifs system has several workqueues used in file.c and other places. > One of them is used for completion of a read and to release the > page_lock which wakes up the reader. There are several other workqueues > that do various other tasks. > > A holder of the reader lock can sleep on a page_lock() and expect the > reader workqueue to wake it up (page_unlock()). The reader workqueue > takes no locks so this does not seem to be a problem (but it is). > > The other workqueues can take the rwsem for read or for write. But our > issue that we tripped over was that it grabs it for read (remember in > -rt readers are serialized). But this can also happen if a separate > writer is waiting on the lock as that would cause a reader to block on > another reader too. > > All the workqueue callbacks are executed on the same workqueue: > > queue_work(cifsiod_wq, &rdata->work); > [...] > queue_work(cifsiod_wq, &cfile->oplock_break); > > Now if the reader workqueue callback is queued after one of these > workqueues that can take the rwsem, we can hit a deadlock. The > workqueue code looks to be able to prevent deadlocks of these kinds, > but I do not totally understand the workqueue scheduled work structure > and perhaps if the kworker thread structure blocks hard it wont move > works around. > > Here's what we see: > > rdata->work is scheduled after cfile->oplock_break > > CPU0 CPU1 > ---- ---- > > do_sync_read() > cifs_strict_readv() > down_read(cinode->lock_sem); > generic_file_aio_read() > __lock_page_killable() > __wait_on_bit_lock() > > * BLOCKED * > > process_one_work() > cifs_oplock_break() > cifs_has_mand_locks() > down_read(cinode->lock_sem); > > * BLOCKED * > > [ note, cifs_oplock_break() can > also call cifs_push_locks which takes > the lock with down_write() ] > > The key to remember is that the work to wake up the lock owner is queued > behind the oplock work which is blocked on the same lock. > > We noticed that the rdata->work was queued to run under the same > workqueue task and this work is to wake up the owner of the semaphore. > But because the workqueue task is blocked waiting on that lock, it will > never wake it up. > > By adding another workqueue that runs all the work that might take a mutex > we should be able to avoid this deadlock. > > Link: http://lkml.kernel.org/r/20140319151252.16ed3ac6@xxxxxxxxxxxxxxxxxx > > Signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx> > --- > fs/cifs/cifsfs.c | 17 ++++++++++++++++- > fs/cifs/cifsglob.h | 1 + > fs/cifs/misc.c | 2 +- > fs/cifs/smb2misc.c | 6 +++--- > 4 files changed, 21 insertions(+), 5 deletions(-) > > diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c > index 849f613..b0761c8 100644 > --- a/fs/cifs/cifsfs.c > +++ b/fs/cifs/cifsfs.c > @@ -86,6 +86,12 @@ extern mempool_t *cifs_req_poolp; > extern mempool_t *cifs_mid_poolp; > > struct workqueue_struct *cifsiod_wq; > +/* > + * The oplock workqueue must be separate to prevent it from blocking > + * other work that is queued. Work that requires to grab mutex locks > + * must use the 'l' version. > + */ > +struct workqueue_struct *cifsiold_wq; > > #ifdef CONFIG_CIFS_SMB2 > __u8 cifs_client_guid[SMB2_CLIENT_GUID_SIZE]; > @@ -1199,9 +1205,15 @@ init_cifs(void) > goto out_clean_proc; > } > > + cifsiold_wq = alloc_workqueue("cifsiold", WQ_FREEZABLE|WQ_MEM_RECLAIM, 0); > + if (!cifsiold_wq) { > + rc = -ENOMEM; > + goto out_destroy_wq; > + } > + > rc = cifs_fscache_register(); > if (rc) > - goto out_destroy_wq; > + goto out_destroy_rwq; > > rc = cifs_init_inodecache(); > if (rc) > @@ -1249,6 +1261,8 @@ out_destroy_inodecache: > cifs_destroy_inodecache(); > out_unreg_fscache: > cifs_fscache_unregister(); > +out_destroy_rwq: > + destroy_workqueue(cifsiold_wq); > out_destroy_wq: > destroy_workqueue(cifsiod_wq); > out_clean_proc: > @@ -1273,6 +1287,7 @@ exit_cifs(void) > cifs_destroy_inodecache(); > cifs_fscache_unregister(); > destroy_workqueue(cifsiod_wq); > + destroy_workqueue(cifsiold_wq); > cifs_proc_clean(); > } > > diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h > index c0f3718..6c2b5c8 100644 > --- a/fs/cifs/cifsglob.h > +++ b/fs/cifs/cifsglob.h > @@ -1561,6 +1561,7 @@ void cifs_oplock_break(struct work_struct *work); > > extern const struct slow_work_ops cifs_oplock_break_ops; > extern struct workqueue_struct *cifsiod_wq; > +extern struct workqueue_struct *cifsiold_wq; > > extern mempool_t *cifs_mid_poolp; > > diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c > index 2f9f379..1bc94e9 100644 > --- a/fs/cifs/misc.c > +++ b/fs/cifs/misc.c > @@ -468,7 +468,7 @@ is_valid_oplock_break(char *buffer, struct TCP_Server_Info *srv) > > cifs_set_oplock_level(pCifsInode, > pSMB->OplockLevel ? OPLOCK_READ : 0); > - queue_work(cifsiod_wq, > + queue_work(cifsiold_wq, > &netfile->oplock_break); > netfile->oplock_break_cancelled = false; > > diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c > index fb39662..ffea93f 100644 > --- a/fs/cifs/smb2misc.c > +++ b/fs/cifs/smb2misc.c > @@ -447,7 +447,7 @@ smb2_tcon_has_lease(struct cifs_tcon *tcon, struct smb2_lease_break *rsp, > else > cfile->oplock_break_cancelled = true; > > - queue_work(cifsiod_wq, &cfile->oplock_break); > + queue_work(cifsiold_wq, &cfile->oplock_break); > kfree(lw); > return true; > } > @@ -463,7 +463,7 @@ smb2_tcon_has_lease(struct cifs_tcon *tcon, struct smb2_lease_break *rsp, > memcpy(lw->lease_key, open->lease_key, > SMB2_LEASE_KEY_SIZE); > lw->tlink = cifs_get_tlink(open->tlink); > - queue_work(cifsiod_wq, &lw->lease_break); > + queue_work(cifsiold_wq, &lw->lease_break); > } > > cifs_dbg(FYI, "found in the pending open list\n"); > @@ -579,7 +579,7 @@ smb2_is_valid_oplock_break(char *buffer, struct TCP_Server_Info *server) > rsp->OplockLevel ? SMB2_OPLOCK_LEVEL_II : 0, > 0, NULL); > > - queue_work(cifsiod_wq, &cfile->oplock_break); > + queue_work(cifsiold_wq, &cfile->oplock_break); > > spin_unlock(&cifs_file_list_lock); > spin_unlock(&cifs_tcp_ses_lock); > -- > 1.8.1.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html