Re: [PATCH] sunrpc: remove unnecessary svc_xprt_put

Neil Brown <neilb@xxxxxxx> · Sat, 27 Feb 2010 12:35:37 +1100

On Fri, 26 Feb 2010 18:40:58 -0600
Tom Tucker <tom@xxxxxxxxxxxxxxxxxxxxx> wrote:

> J. Bruce Fields wrote:
> > On Sat, Feb 27, 2010 at 09:33:40AM +1100, Neil Brown wrote:
> >     
> >> [I found this while looking for the current refcount problem
> >>  that triggers a warning in svc_recv.  This isn't that bug
> >>  but is a different refcount bug - NB]
> >>       
> >
> >     
> 
> I seem to recall that we added that reference for  a reason. There was 
> an issue with unmount while there were deferrals pending. That's why the 
> reference was added.
> 
> Tom

What reference?
What I (thought I) found was code that was dropping a reference which it
didn't hold.  Are you saying that it is supposed to be holding a reference
here, but isn't, or that it really is holding a reference here and I didn't
see it?

And just for completeness, my understanding of the refcounting here is:

A counted references is held on an svc_xprt when:
 - a 'struct rqst' refers to it through ->rq_xprt
 - a 'cache_deferred_req' refers to it through ->xprt
    This only happens while the req is waiting to be
    revisited, and is in the hash table and on the lru.
    Once the req gets revisited (svc_revisit) ->xprt
    is set to NULL and the reference is dropped.
 - XPT_DEAD is *not* set.  So the refcount is initialised
   to '1' to reflect this, and this ref is dropped
   when we set XPT_DEAD.
 - there are a few transient references in svc_xprt.c
   which very clearly have matched 'get' and 'put'.
 - svc_find_xprt returns a counted reference.  This is
   called once in lockd and once in nfsd, and both
   calls drop the ref correctly.

Whenever we drop a counted ref that was stored in a pointer, we set that
pointer to NULL.
So if there was a race where two threads both get a reference from a pointer
and then drop that reference, you would expect that slightly different timing
would cause one of those threads to get a NULL from the pointer, dereference
it, and crash.  There are no important tests-for-NULL on either of the
pointers in question, so that wouldn't be protecting us from a crash.  But
we don't see that crash, so there cannot be a race there.

So: The refcount cannot possibly be zero in svc_recv :-)

I just noticed some slightly odd code later in svc_recv:

 if (XPT_LISTENER && XPT_CLOSE) {
     ...
 } else if (XPT_CLOSE) {
     ...
     ->xpo_recvfrom()
 }
 if (XPT_CLOSE) {
    ...
    svc_delete_xprt()
 }

 So if XPT_CLOSE is set while xpo_recvfrom is being called, which I think
 is possible, and if ->xpo_recvfrom returns non-zero, then we end up
 processing a request on a dead socket, which doesn't sound like the right
 thing to do.  I don't think it can cause the present problem, but
 it looks wrong.  That last 'if' should just be an 'else'.
 I guess that would effectively reverse b0401d7253, though - not that
 that patch seems entirely right to me - if there is a problem I probably
 would have fixed it differently, though I'm not sure how.
 So maybe change "if (XPT_CLOSE)" to "if (len <= 0 && XPT_CLOSE)" ???

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html