Re: /etc/mtab read ~900 times by rpc.mountd

NeilBrown <neilb@xxxxxxxx> · Tue, 11 Jul 2017 09:51:20 +1000

On Mon, Jul 10 2017, Phil Kauffman wrote:

> On 07/10/2017 02:17 AM, NeilBrown wrote:
>> Another is to move the code around.  In your case there are really just
>> 3 exports to each of 300+ clients (I assume "client.cs.uchicago.edu" in
>> the combined exports file is really different in different files).
>> So any one client only needs to consider 3 exports, not 300.
> This assumption is correct. We have ~300 clients and growing (slowly).
>
>> Could you test with this patch applied and see what
>> difference it makes?
> After being confused by conflicting test results and determining that the service nfs-server takes a bit longer to start than 'systemctl' will let you believe, I believe Niel's patch works.
>
>
> The new cache.c file I am using to test with: http://people.cs.uchicago.edu/~kauffman/nfs-kernel-server/test_with_patch/cache.c
>   It also contains the official patch from this thread: http://marc.info/?l=linux-nfs&m=138900709311582&w=2
>
> Building the new deb packages: http://people.cs.uchicago.edu/~kauffman/nfs-kernel-server/test_with_patch/build_deb.txt
>
> Install nfs-kernel-server and nfs-common debs on server and nfs-common on client.
>
>
> I reboot everything: client and server
>
> Run ssh for loop (no stracing) with the result that every user was able to ssh in under a second (I consider this to be a success).

This does look encouraging ... but I'm not sure we are comparing apples
with apples.

In the original strace we see.

 read(4, ....)
 check on etab
 statfs(some path)
 write(4,....

which is exactly like the new strace.
But then the old strace also has

 read(5, ...)
 check on etab
 read /etc/mtab hundreds of times.
 write(5, ...)

which is completely missing in the new strace.  So the new code which
doesn't read /etc/mtab as much, isn't being run at all.

file desc 4 is /proc/1926/net/rpc/nfsd.export/channel
file desc 5 is /proc/1926/net/rpc/nfsd.fh/channel

If the nfs server kernel has nothing cached, then when a request arrives
you should first see a transaction of fd 3
(/proc/1926/net/rpc/auth.unix.ip/channel) to map the IP address to a
"domain".
Then a transaction on fd 5 (nfsd.fh) to map the domain + a filehandle
fragment to an export point.
Then a transaction on fd 4 (nfsd.export) to map the domain + export
point to some export options.

If the request is a "LOOKUP", you might see another "fd 5" lookup after
the "fd 4" to find out if a sub-mountpoint is exported.  That is what
the old strace showed.

If you don't see all of those, then something is cached.

Could you repeat your experiments after first running "exportfs -f" on
the nfs server?  That should cause worst-case load on mountd.

Until we see the "read(5).... write(5)" sequence completing quickly, I
cannot be sure that we have fixed anything.  Actually.. for future
traces, please add "-tt" so I can see where the delays are, and how much
they are.

(and thanks for providing all the tracing details - I love getting
unambiguous data!!)

Thanks,
NeilBrown
Attachment:
signature.asc

Description: PGP signature