Can you confirm that you were NOT using unify int he setup?? regards, avati On Thu, Apr 05, 2007 at 01:09:16AM -0400, Brent A Nelson wrote: > Awesome! > > Thanks, > > Brent > > On Wed, 4 Apr 2007, Anand Avati wrote: > > >Brent, > > thank you so much for your efforts of sending the output! > >from the log it is clear the leak fd's are all for directories. Indeed > >there was an issue with releasedir() call reaching all the nodes. The > >fix should be committed today to tla. > > > >Thanks!! > > > >avati > > > > > > > >On Wed, Apr 04, 2007 at 09:18:48PM -0400, Brent A Nelson wrote: > >>I avoided restarting, as this issue would take a while to reproduce. > >> > >>jupiter01 and jupiter02 are mirrors of each other. All performance > >>translators are in use, except for writebehind (due to the mtime bug). > >> > >>jupiter01: > >>ls -l /proc/26466/fd |wc > >> 65536 655408 7358168 > >>See attached for ls -l output. > >> > >>jupiter02: > >>ls -l /proc/3651/fd |wc > >>ls -l /proc/3651/fd > >>total 11 > >>lrwx------ 1 root root 64 2007-04-04 20:43 0 -> /dev/null > >>lrwx------ 1 root root 64 2007-04-04 20:43 1 -> /dev/null > >>lrwx------ 1 root root 64 2007-04-04 20:43 10 -> socket:[2565251] > >>lrwx------ 1 root root 64 2007-04-04 20:43 2 -> /dev/null > >>l-wx------ 1 root root 64 2007-04-04 20:43 3 -> > >>/var/log/glusterfs/glusterfsd.log > >>lrwx------ 1 root root 64 2007-04-04 20:43 4 -> socket:[2255275] > >>lrwx------ 1 root root 64 2007-04-04 20:43 5 -> socket:[2249710] > >>lr-x------ 1 root root 64 2007-04-04 20:43 6 -> eventpoll:[2249711] > >>lrwx------ 1 root root 64 2007-04-04 20:43 7 -> socket:[2255306] > >>lr-x------ 1 root root 64 2007-04-04 20:43 8 -> > >>/etc/glusterfs/glusterfs-client.vol > >>lr-x------ 1 root root 64 2007-04-04 20:43 9 -> > >>/etc/glusterfs/glusterfs-client.vol > >> > >>Note that it looks like all those extra directories listed on jupiter01 > >>were locally rsynched from jupiter01's Lustre filesystems onto the > >>glusterfs client on jupiter01. A very large rsync from a different > >>machine to jupiter02 didn't go nuts. > >> > >>Thanks, > >> > >>Brent > >> > >>On Wed, 4 Apr 2007, Anand Avati wrote: > >> > >>>Brent, > >>>I hope the system is still in the same state to dig some info out. > >>>To verify that it is a file descriptor leak, can you please run this > >>>test. On the server, run ps ax and get the PID of glusterfsd. then do > >>>an ls -l on /proc/<pid>/fd/ and please mail the output of that. That > >>>should give a precise idea of what is happening. > >>>If the system has been reset out of the state, please give us the > >>>spec file you are using and the commands you ran (of some major jobs > >>>like heavy rsync) so that we will try to reproduce the error in our > >>>setup. > >>> > >>>regards, > >>>avati > >>> > >>> > >>>On Wed, Apr 04, 2007 at 01:12:33PM -0400, Brent A Nelson wrote: > >>>>I put a 2-node GlusterFS mirror into use internally yesterday, as > >>>>GlusterFS was looking pretty solid, and I rsynced a whole bunch of stuff > >>>>to it. Today, however, an ls on any of the three clients gives me: > >>>> > >>>>ls: /backup: Too many open files > >>>> > >>>>It looks like glusterfsd hit a limit. Is this a bug > >>>>(glusterfs/glusterfsd > >>>>forgetting to close files; essentially, a file descriptor leak), or do I > >>>>just need to increase the limit somewhere? > >>>> > >>>>Thanks, > >>>> > >>>>Brent > >>>> > >>>> > >>>>_______________________________________________ > >>>>Gluster-devel mailing list > >>>>Gluster-devel@xxxxxxxxxx > >>>>http://lists.nongnu.org/mailman/listinfo/gluster-devel > >>>> > >>> > >>>-- > >>>Shaw's Principle: > >>> Build a system that even a fool can use, > >>> and only a fool will want to use it. > >>> > > > > > > > >-- > >Shaw's Principle: > > Build a system that even a fool can use, > > and only a fool will want to use it. > > > -- Shaw's Principle: Build a system that even a fool can use, and only a fool will want to use it.