On Jul 6, 2009, at 1:14 PM, Trond Myklebust wrote:
On Mon, 2009-07-06 at 12:57 -0400, Chuck Lever wrote:
On Jul 6, 2009, at 12:40 PM, Trond Myklebust wrote:
On Mon, 2009-07-06 at 12:31 -0400, Chuck Lever wrote:
I have considered that. AF_LOCAL in fact could replace all of our
upcall mechanisms. However, portmapper, which doesn't support
AF_LOCAL, is still used in some distributions.
As could AF_NETLINK, fork(), pipes, fifos, etc... Again: why would
we
want to saddle ourselves with rpc over AF_LOCAL?
TI-RPC supports AF_LOCAL RPC transports.
[cel@matisse notify-one]$ rpcinfo
program version netid address service owner
100000 4 tcp6 ::.0.111 portmapper
superuser
100000 3 tcp6 ::.0.111 portmapper
superuser
100000 4 udp6 ::.0.111 portmapper
superuser
100000 3 udp6 ::.0.111 portmapper
superuser
100000 4 tcp 0.0.0.0.0.111 portmapper
superuser
100000 3 tcp 0.0.0.0.0.111 portmapper
superuser
100000 2 tcp 0.0.0.0.0.111 portmapper
superuser
100000 4 udp 0.0.0.0.0.111 portmapper
superuser
100000 3 udp 0.0.0.0.0.111 portmapper
superuser
100000 2 udp 0.0.0.0.0.111 portmapper
superuser
100000 4 local /var/run/rpcbind.sock portmapper
superuser
100000 3 local /var/run/rpcbind.sock portmapper
superuser
100024 1 udp 0.0.0.0.206.127 status 29
100024 1 tcp 0.0.0.0.166.105 status 29
100024 1 udp6 ::.141.238 status 29
100024 1 tcp6 ::.192.160 status 29
[cel@matisse notify-one]$
The listing for '/var/run/rpcbind.sock' is rpcbind's AF_LOCAL
listener. TI-RPC's rpcb_foo() calls use this method of accessing the
rpcbind database rather than going over loopback.
rpcbind scrapes the caller's effective UID off the transport socket
and uses that for authentication. Note the "owner" column... that
comes from the socket's UID, not from the r_owner field. When a
service is registered over the network, the owner column says
"unknown" and basically anyone can unset it.
If the kernel used AF_LOCAL to register its services, it would mean
we
would never use a network port for local rpcbind calls between the
kernel and rpcbind, and rpcbind could automatically prevent the
kernel's RPC services from getting unset by malicious users. If /
var/
run/rpcbind.sock isn't there, the kernel would know immediately that
rpcbind wasn't running.
So what? You can achieve the same with any number of communication
channels (including the network). Just add a timeout to the current
'connect()' function, and set it to a low value when doing rpcbind
upcalls.
I suggested such a scheme last year when we first discussed connected
UDP, and it was decided that especially short timeouts for local
rpcbind calls were not appropriate.
In general, however, the network layer does tell us immediately when
the service is not running (ICMP port unreachable or RST). The
kernel's RPC client is basically ignoring that information.
What's so special about libtirpc or rpcbind that we have to keep
redesigning the kernel to work around their limitations instead of the
other way round?
I'm not sure what you're referring to, in specific.
However, since rpcbind is a standard network protocol, the kernel
really does have to talk the protocol correctly if we want to
interoperate with non-Linux implementations. For local-only cases, we
need to ensure that the kernel is backwards compatible with portmapper.
In this case, Suresh and Neil are dealing with a problem that occurs
whether rpcbind or portmapper is running -- basically during shutdown,
if user space has killed those processes, the kernel waits for a bit
instead of deciding immediately that it should exit. Nothing to do
with TI-RPC, though TI-RPC does offer a potential solution (AF_LOCAL).
In the mount.nfs case, user space uses RST/port unreachable
specifically for determining when the server does not support a
particular transport (see nfs_probe_port). That code is actually
baked into the mount command, it's not part of the library. If we
want to see version/transport negotiation in the kernel, then the
kernel rpcbind client has to have the ability to detect quickly when
the remote does not support the requested transport. Again, nothing
to do with TI-RPC.
In both cases, it turns out that the library implementations in user
space already fail quickly. RPC_CANTRECV is returned if an attempt is
made to send an rpcbind query to an inactive UDP port.
RPC_SYSTEMERROR/ECONNREFUSED is returned if an attempt is made to send
an rpcbind query to an inactive TCP port. In my view, the kernel is
lacking here, and should be made to emulate user space more closely.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html