Re: [PATCH 0/7] Remaining rpcbind patches for 2.6.27

"Chuck Lever" <chuck.lever@xxxxxxxxxx> · Mon, 7 Jul 2008 15:44:17 -0400

On Mon, Jul 7, 2008 at 2:51 PM, Trond Myklebust
<Trond.Myklebust@xxxxxxxxxx> wrote:
> On Mon, 2008-07-07 at 14:43 -0400, Chuck Lever wrote:
>> On Jul 7, 2008, at 2:20 PM, Trond Myklebust wrote:
>> > On Thu, 2008-07-03 at 16:45 -0400, J. Bruce Fields wrote:
>> >> On Mon, Jun 30, 2008 at 06:38:35PM -0400, Chuck Lever wrote:
>> >>> Hi Trond-
>> >>>
>> >>> Seven patches that implement kernel RPC service registration via
>> >>> rpcbind v4.
>> >>> This allows the kernel to advertise IPv4-only services on hosts
>> >>> with IPv6
>> >>> addresses, for example.
>> >>
>> >> This is Trond's baliwick, but I read through all 7 quickly and they
>> >> looked good to me....
>> >
>> > They look more or less OK to me too, however I'm a bit unhappy about
>> > the
>> > RPC_TASK_ONESHOT name: it isn't at all descriptive.
>>
>> Open to suggestions.  I thought RPC_TASK_FAIL_WITHOUT_CONNECTION was a
>> bit wordy ;-)
>
> RPC_TASK_CONNECT_ONCE ?

That's not the semantic I was really going for.  FAIL_ON_CONNRESET is
probably closer.

>> > I also have questions about the change to a TCP socket here. Why not
>> > just implement connected UDP sockets?
>>
>> Changing rpcb_register() to use a TCP socket is less work overall, and
>> we get a positive hand shake between the kernel and user space when
>> the TCP connection is opened.
>>
>> Other services might also want to use TCP+ONESHOT for several short
>> requests over a real network with actual packet loss, but they might
>> find CUDP+ONESHOT less practical/reliable (or even forbidden in the
>> case of NFSv4).  So we would end up with something of a one-off
>> implementation for rpcb_register.
>
> I don't see what that has to do with anything: the connection failed
> codepath in call_connect_status() should be the same in both the TCP and
> the UDP case.

If you would like connected UDP, I won't object to you implementing
it.  However, I never tested whether a connected UDP socket will give
the desired semantics without extra code in the UDP transport (for
example, an ->sk_error callback).  I don't think it's worth the hassle
if we have to add code to UDP that only this tiny use case would need.

>> The downside of using TCP in this case is that it's more overhead:  8
>> packets instead of two for registration in the common case, and it
>> leaves a single privileged port in TIME_WAIT for each registered
>> service.  I don't think this matters much as registration happens
>> quite infrequently.
>
> The problem is that registration usually happens at boot time, which is
> also when most of the NFS 'mount' requests will be eating privileged
> ports.

You're talking about the difference between supporting say 1358 mounts
at boot time versus 1357 mounts at boot time.

In most cases, a client with hundreds of mounts will use up exactly
one extra privileged TCP port to register NLM during the first
lockd_up() call.  If these are all NFSv4 mounts, it will use exactly
zero extra ports, since the NFSv4 callback service is not even
registered.

Considering that _each_ mount operation can take between 2 and 5
privileged ports, while registering NFSD and NLM both would take
exactly two ports at boot time, I think that registration is wrong
place to optimize.

-- 
Chuck Lever
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html