Re: [PATCH] ceph: when filling trace, do ceph_get_inode outside of mutexes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2020-11-13 at 10:40 +0000, Luis Henriques wrote:
> Jeff Layton <jlayton@xxxxxxxxxx> writes:
> 
> > On Thu, 2020-11-12 at 13:15 -0500, Jeff Layton wrote:
> > > Geng Jichao reported a rather complex deadlock involving several
> > > moving parts:
> > > 
> > > 1) readahead is issued against an inode and some of its pages are locked
> > >    while the read is in flight
> > > 
> > > 2) the same inode is evicted from the cache, and this task gets stuck
> > >    waiting for the page lock because of the above readahead
> > > 
> > > 3) another task is processing a reply trace, and looks up the inode
> > >    being evicted while holding the s_mutex. That ends up waiting for the
> > >    eviction to complete
> > > 
> > > 4) a write reply for an unrelated inode is then processed in the
> > >    ceph_con_workfn job. It calls ceph_check_caps after putting wrbuffer
> > >    caps, and that gets stuck waiting on the s_mutex held by 3.
> > > 
> > > The reply to "1" is stuck behind the write reply in "4", so we deadlock
> > > at that point.
> > > 
> > > This patch changes the trace processing to call ceph_get_inode outside
> > > of the s_mutex and snap_rwsem, which should break the cycle above.
> > > 
> > > URL: https://tracker.ceph.com/issues/47998
> > > Reported-by: Geng Jichao <gengjichao@xxxxxx>
> > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > > ---
> > >  fs/ceph/inode.c      | 13 ++++---------
> > >  fs/ceph/mds_client.c | 15 +++++++++++++++
> > >  2 files changed, 19 insertions(+), 9 deletions(-)
> > > 
> > 
> > My belief is that this patch probably won't hurt anything and might fix
> > this specific deadlock. I do however wonder if we might have a whole
> > class of deadlocks of this nature. I can't quite prove it by inspection
> > though, so testing with this would be a good datapoint.
> 
> Yeah, I keep staring at this and can't decide either.  But I agree with
> you: this may or may not fix the issue, but shouldn't hurt.  Just a minor
> comment bellow:
> 
> > > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> > > index 88bbeb05bd27..9b85d86d8efb 100644
> > > --- a/fs/ceph/inode.c
> > > +++ b/fs/ceph/inode.c
> > > @@ -1315,15 +1315,10 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
> > >  	}
> > >  
> > > 
> > > 
> > > 
> > >  	if (rinfo->head->is_target) {
> > > -		tvino.ino = le64_to_cpu(rinfo->targeti.in->ino);
> > > -		tvino.snap = le64_to_cpu(rinfo->targeti.in->snapid);
> > > -
> > > -		in = ceph_get_inode(sb, tvino);
> > > -		if (IS_ERR(in)) {
> > > -			err = PTR_ERR(in);
> > > -			goto done;
> > > -		}
> > > +		/* Should be filled in by handle_reply */
> > > +		BUG_ON(!req->r_target_inode);
> > >  
> > > 
> > > 
> > > 
> > > +		in = req->r_target_inode;
> > >  		err = ceph_fill_inode(in, req->r_locked_page, &rinfo->targeti,
> > >  				NULL, session,
> > >  				(!test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags) &&
> > > @@ -1333,13 +1328,13 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
> > >  		if (err < 0) {
> > >  			pr_err("ceph_fill_inode badness %p %llx.%llx\n",
> > >  				in, ceph_vinop(in));
> > > +			req->r_target_inode = NULL;
> > >  			if (in->i_state & I_NEW)
> > >  				discard_new_inode(in);
> > >  			else
> > >  				iput(in);
> > >  			goto done;
> > >  		}
> > > -		req->r_target_inode = in;
> > >  		if (in->i_state & I_NEW)
> > >  			unlock_new_inode(in);
> > >  	}
> > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > > index b3c941503f8c..48db1f593d81 100644
> > > --- a/fs/ceph/mds_client.c
> > > +++ b/fs/ceph/mds_client.c
> > > @@ -3179,6 +3179,21 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg)
> > >  		err = parse_reply_info(session, msg, rinfo, session->s_con.peer_features);
> > >  	mutex_unlock(&mdsc->mutex);
> > >  
> > > 
> > > 
> > > 
> > > +	/* Must find target inode outside of mutexes to avoid deadlocks */
> > > +	if (rinfo->head->is_target) {
> 
> This condition should probably be
> 
> 	if ((err >= 0) && rinfo->head->is_target) {
> 
> This would allow a faster exit path and eventually avoid masking this
> 'err' value with another one from ceph_get_inode().
> 


Good catch. I'll fix that and send a v2.

Thanks,
-- 
Jeff Layton <jlayton@xxxxxxxxxx>




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux