Brent, the fix has been committed. can you please check if it works for you? regards, avati On Thu, Apr 05, 2007 at 02:09:26AM -0400, Brent A Nelson wrote: > That's correct. I had commented out unify when narrowing down the mtime > bug (which turned out to be writebehind) and then decided I had no reason > to put it back in for this two-brick filesystem. It was mounted without > unify when this issue occurred. > > Thanks, > > Brent > > On Wed, 4 Apr 2007, Anand Avati wrote: > > >Can you confirm that you were NOT using unify int he setup?? > > > >regards, > >avati > > > > > >On Thu, Apr 05, 2007 at 01:09:16AM -0400, Brent A Nelson wrote: > >>Awesome! > >> > >>Thanks, > >> > >>Brent > >> > >>On Wed, 4 Apr 2007, Anand Avati wrote: > >> > >>>Brent, > >>>thank you so much for your efforts of sending the output! > >>>from the log it is clear the leak fd's are all for directories. Indeed > >>>there was an issue with releasedir() call reaching all the nodes. The > >>>fix should be committed today to tla. > >>> > >>>Thanks!! > >>> > >>>avati > >>> > >>> > >>> > >>>On Wed, Apr 04, 2007 at 09:18:48PM -0400, Brent A Nelson wrote: > >>>>I avoided restarting, as this issue would take a while to reproduce. > >>>> > >>>>jupiter01 and jupiter02 are mirrors of each other. All performance > >>>>translators are in use, except for writebehind (due to the mtime bug). > >>>> > >>>>jupiter01: > >>>>ls -l /proc/26466/fd |wc > >>>> 65536 655408 7358168 > >>>>See attached for ls -l output. > >>>> > >>>>jupiter02: > >>>>ls -l /proc/3651/fd |wc > >>>>ls -l /proc/3651/fd > >>>>total 11 > >>>>lrwx------ 1 root root 64 2007-04-04 20:43 0 -> /dev/null > >>>>lrwx------ 1 root root 64 2007-04-04 20:43 1 -> /dev/null > >>>>lrwx------ 1 root root 64 2007-04-04 20:43 10 -> socket:[2565251] > >>>>lrwx------ 1 root root 64 2007-04-04 20:43 2 -> /dev/null > >>>>l-wx------ 1 root root 64 2007-04-04 20:43 3 -> > >>>>/var/log/glusterfs/glusterfsd.log > >>>>lrwx------ 1 root root 64 2007-04-04 20:43 4 -> socket:[2255275] > >>>>lrwx------ 1 root root 64 2007-04-04 20:43 5 -> socket:[2249710] > >>>>lr-x------ 1 root root 64 2007-04-04 20:43 6 -> eventpoll:[2249711] > >>>>lrwx------ 1 root root 64 2007-04-04 20:43 7 -> socket:[2255306] > >>>>lr-x------ 1 root root 64 2007-04-04 20:43 8 -> > >>>>/etc/glusterfs/glusterfs-client.vol > >>>>lr-x------ 1 root root 64 2007-04-04 20:43 9 -> > >>>>/etc/glusterfs/glusterfs-client.vol > >>>> > >>>>Note that it looks like all those extra directories listed on jupiter01 > >>>>were locally rsynched from jupiter01's Lustre filesystems onto the > >>>>glusterfs client on jupiter01. A very large rsync from a different > >>>>machine to jupiter02 didn't go nuts. > >>>> > >>>>Thanks, > >>>> > >>>>Brent > >>>> > >>>>On Wed, 4 Apr 2007, Anand Avati wrote: > >>>> > >>>>>Brent, > >>>>>I hope the system is still in the same state to dig some info out. > >>>>>To verify that it is a file descriptor leak, can you please run this > >>>>>test. On the server, run ps ax and get the PID of glusterfsd. then do > >>>>>an ls -l on /proc/<pid>/fd/ and please mail the output of that. That > >>>>>should give a precise idea of what is happening. > >>>>>If the system has been reset out of the state, please give us the > >>>>>spec file you are using and the commands you ran (of some major jobs > >>>>>like heavy rsync) so that we will try to reproduce the error in our > >>>>>setup. > >>>>> > >>>>>regards, > >>>>>avati > >>>>> > >>>>> > >>>>>On Wed, Apr 04, 2007 at 01:12:33PM -0400, Brent A Nelson wrote: > >>>>>>I put a 2-node GlusterFS mirror into use internally yesterday, as > >>>>>>GlusterFS was looking pretty solid, and I rsynced a whole bunch of > >>>>>>stuff > >>>>>>to it. Today, however, an ls on any of the three clients gives me: > >>>>>> > >>>>>>ls: /backup: Too many open files > >>>>>> > >>>>>>It looks like glusterfsd hit a limit. Is this a bug > >>>>>>(glusterfs/glusterfsd > >>>>>>forgetting to close files; essentially, a file descriptor leak), or > >>>>>>do I > >>>>>>just need to increase the limit somewhere? > >>>>>> > >>>>>>Thanks, > >>>>>> > >>>>>>Brent > >>>>>> > >>>>>> > >>>>>>_______________________________________________ > >>>>>>Gluster-devel mailing list > >>>>>>Gluster-devel@xxxxxxxxxx > >>>>>>http://lists.nongnu.org/mailman/listinfo/gluster-devel > >>>>>> > >>>>> > >>>>>-- > >>>>>Shaw's Principle: > >>>>> Build a system that even a fool can use, > >>>>> and only a fool will want to use it. > >>>>> > >>> > >>> > >>> > >>>-- > >>>Shaw's Principle: > >>> Build a system that even a fool can use, > >>> and only a fool will want to use it. > >>> > >> > > > >-- > >Shaw's Principle: > > Build a system that even a fool can use, > > and only a fool will want to use it. > > > -- Shaw's Principle: Build a system that even a fool can use, and only a fool will want to use it.