Re: [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations.

Chuck Lever <chuck.lever@xxxxxxxxxx> · Thu, 2 Jul 2009 16:04:12 -0400

On Jun 11, 2009, at 11:44 AM, Chuck Lever wrote:
On Jun 11, 2009, at 12:48 AM, Neil Brown wrote:
On Thursday May 28, chuck.lever@xxxxxxxxxx wrote:
On May 28, 2009, at 2:33 AM, NeilBrown wrote:

[An alternate might be to make the sunrpc code always "connect"
udp sockets so that "port not reachable" errors would get reported
back.  This requires a more intrusive change though and might have
other consequences]

We had discussed this about a year ago when I started adding IPv6
support.  I had suggested switching the local rpc client to use TCP
instead of UDP to solve exactly this time-out problem during start-
up.  There was some resistance to the idea because TCP would leave
privileged ports in TIMEWAIT (at shutdown, this is probably not a
significant concern).

Trond had intended to introduce connected UDP socket support to the
RPC client, although we were also interested in someday having a
single UDP socket for all RPC traffic... the design never moved on
from there.

My feeling at this point is that having a connected UDP socket
transport would be simpler and have broader benefits than waiting  
for
an eventual design that can accommodate multiple transport instances
sharing a single socket.

The use of connected UDP would have to be limited to known-safe cases
such as contacting the local portmap.  I believe there are still NFS
servers out there that - if multihomed - can reply from a different
address to the one the request was sent to.

I think I advocated for adding an entirely new transport capability  
called CUDP at the time.  But this is definitely something to  
remember as we test.

If a new transport capability is added, at this point we would  
likely need some additional logic in the NFS mount parsing logic to  
expose such a transport to user space.  So, leaving that parsing  
logic alone should insulate the NFS client from the new transport  
until we have more confidence.

And we would need to check that rpcbind does the right thing.  I
recently discovered that rpcbind is buggy and will sometimes respond
from the wrong interface - I suspect localhost addresses are safe,  
but
we would need to check, or fix it (I fixed that bug in portmap (glibc
actually) 6 years ago and now it appears again in rpcbind - groan!).

Details welcome.  We will probably need to fix libtirpc.

How hard would it be to add (optional) connected UDP support?  Would
we just make the code more like the TCP version, or are there any
gotchas that you know of that we would need to be careful of?

The code in net/sunrpc/xprtsock.c is a bunch of transport methods,  
many of which are shared between the UDP and TCP transport  
capabilities.  You could probably do this easily by creating a new  
xprt_class structure and a new ops vector, then reuse as many UDP  
methods as possible.  The TCP connect method could be usable as is,  
but it would be simple to copy-n-paste a new one if some variation  
is required.  Then, define a new XPRT_ value, and use that in  
rpcb_create_local().

I've thought about this some more...

It seems to me that you might be better off using the existing UDP  
transport code, but adding a new RPC_CLNT_CREATE_ flag to enable  
connected UDP semantics.  The two transports are otherwise exactly the  
same.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html