Re: [PATCH 3/3] SUNRPC: svc_xprt_enqueue should not refuse to enqueue 'XPT_DEAD' transports

Tom Tucker <tom@xxxxxxxxxxxxxxxxxxxxx> · Tue, 23 Dec 2008 17:39:16 -0600

Tom Tucker wrote:
Trond Myklebust wrote:
On Wed, 2008-12-17 at 09:35 -0600, Tom Tucker wrote:

Trond Myklebust wrote:

Aside from being racy (there is nothing preventing someone setting 
XPT_DEAD
after the test in svc_xprt_enqueue, and before XPT_BUSY is set), it is
wrong to assume that transports which have called svc_delete_xprt() 
might
not need to be re-enqueued.

This is only true because now you allow transports with XPT_DEAD set 
to be enqueued -- yes?

See the list of deferred requests, which is currently never going to
be cleared if the revisit call happens after svc_delete_xprt(). In 
this
case, the deferred request will currently keep a reference to the 
transport
forever.

I agree this is a possibility and it needs to be fixed. I'm 
concerned that the root cause is still there though. I thought the 
test case was the client side timing out the connection. Why are 
there deferred requests sitting on what is presumably an idle 
connection?

I haven't said that they are the cause of this test case. I've said that
deferred requests hold references to the socket that can obviously
deadlock. That needs to be fixed regardless of whether or not it is the
cause here.

There are plenty of situations in which the client may choose to close
the TCP socket even if there are outstanding requests. One of the most
common is when the user signals the process, so that an RPC call that
was partially transmitted (ran out of buffer space) gets cancelled
before it can finish transmitting. In that case the client has no choice
but to disconnect and immediately reconnect.

The fix should be to allow dead transports to be enqueued in order 
to clear
the deferred requests, then change the order of processing in 
svc_recv() so
that we pick up deferred requests before we do the XPT_CLOSE 
processing.

Wouldn't it be simpler to clean up any deferred requests in the 
close path instead of changing the meaning of XPT_DEAD and 
dispatching N-threads to do the same?

AFAICS, deferred requests are the property of the cache until they
expire or a downcall occurs. I'm not aware of any way to cancel only
those deferred requests that hold a reference to this particular
transport.

Ok, I think you're right, and I think that this fix is correct and 
makes the symptom go away.

I may be completely confused here, but:

- The deferred requests should be getting cleaned up by timing out, 
and that does not not seem to be happening, (Is this true?)

They are getting "cleaned up", but by the time they do the transport is 
dead, the request has been added to the deferred queue, but it won't get 
processed because svc_xprt_enqueue won't "schedule" a dead transport.

- By releasing the underlying connection prior to releasing the 
transport that manages it, we've converted the visible resource leek 
to an invisible one.

Not with your changes per the above.

- This has been around forever and changing the client side close 
behavior graceful exposed this bug,

So I'm wondering if what we want to do here is to provide a mechanism 
for canceling deferred requests for a particular transport. This would 
provide a mechanism for the generic transport driver to force 
cancellation of deferred requests when closing. 
This is a new interface and we'd still need to handle requests sitting 
on the transport's deferred queue. Probably not a good idea.

Tom

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html