Re: Massive NFS problems on large cluster with large number of mounts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 01, 2008 at 10:19:55AM +0200, Carsten Aulbert wrote:
> Hi all (now to the right email list),
> 
> We are running a large cluster and do a lot of cross-mounting between
> the nodes. To get this running we are running a lot of nfsd (196) and
> use mountd with 64 threads, just in case we get a massive number of hits
> onto a single node. All this is on Debian Etch with a recent 2.6.24
> kernel using autofs4 at the moment to do the automounts.

I'm slightly confused--the above is all about server configuration, but
the below seems to describe only client problems?

> 
> When running these two not nice scripts:
> 
> $ cat test_mount
> #!/bin/sh
> 
> n_node=1000
> 
> for i in `seq 1 $n_node`;do
>         n=`echo $RANDOM%1342+10001 | bc| sed -e "s/1/n/"`
>         $HOME/bin/mount.sh $n&
>         echo -n .
> done
> 
> $ cat mount.sh
> #!/bin/sh
> 
> dir="/distributed/spray/data/EatH/S5R1"
> 
> ping -c1 -w1 $1 > /dev/null&& file="/atlas/node/$1$dir/"`ls -f
> /atlas/node/$1$dir/|head -n 50 | tail -n 1`
> md5sum ${file}
> 
> With that we encounter different problems:
> 
> Running this gives this in syslog:
> Jul  1 07:37:19 n1312 rpc.idmapd[2309]: nfsopen:
> open(/var/lib/nfs/rpc_pipefs/nfs/clntaa58/idmap): Too many open files
> Jul  1 07:37:19 n1312 rpc.idmapd[2309]: nfsopen:
> open(/var/lib/nfs/rpc_pipefs/nfs/clntaa58/idmap): Too many open files
> Jul  1 07:37:19 n1312 rpc.idmapd[2309]: nfsopen:
> open(/var/lib/nfs/rpc_pipefs/nfs/clntaa5e/idmap): Too many open files
> Jul  1 07:37:19 n1312 rpc.idmapd[2309]: nfsopen:
> open(/var/lib/nfs/rpc_pipefs/nfs/clntaa5e/idmap): Too many open files
> Jul  1 07:37:19 n1312 rpc.idmapd[2309]: nfsopen:
> open(/var/lib/nfs/rpc_pipefs/nfs/clntaa9c/idmap): Too many open files
> 
> Which is not surprising to me. However, there are a few things I'm
> wondering about.
> 
> (1) All our mounts use nfsvers=3 why is rpc.idmapd involved at all?

Are there actually files named "idmap" in those directories?  (Looks to
me like they're only created in the v4 case, so I assume those open
calls would return ENOENT if they didn't return ENFILE....)

> (2) Why is this daemon growing so extremely large?
> # ps aux|grep rpc.idmapd
> root      2309  0.1 16.2 2037152 1326944 ?     Ss   Jun30   1:24
> /usr/sbin/rpc.idmapd

I think rpc.idmapd has some state for each directory whether they're for
a v4 client or not, since it's using dnotify to watch for an "idmap"
file to appear in each one.  The above shows about 2k per mount?

--b.

> NOTE: We are now disabling this one, but still it wouldbe nice to
> understand why there seem to be a memory leak.
> 
> (3) The script maxes out at about 340 concurrent mounts, any idea how to
> increase this number? We are already running all servers with the
> insecure option, thus low ports should not be a restriction.
> (4) After running this script /etc/mtab and /proc/mounts are out of
> sync. Ian Kent from autofs fame suggested a broken local mount
> implementation which does not lock mtab well enough. Any idee about that?
> 
> We are currently testing autofs5 and this is not giving these messages,
> but still we are not using high/unprivilidged ports.
> 
> TIA for any help you might give us.
> 
> Cheers
> 
> Carsten
> 
> -- 
> Dr. Carsten Aulbert - Max Planck Institut für Gravitationsphysik
> Callinstraße 38, 30167 Hannover, Germany
> Fon: +49 511 762 17185, Fax: +49 511 762 17193
> http://www.top500.org/system/9234 | http://www.top500.org/connfam/6/list/31
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux