Brent, can you please send me your spec files? because I am able to 'ls' without any problems and there is no fd leak observed. I have loaded just cluster/afr, and previously had loaded all performance xlators on both server and clietn side together and in both the cases things worked perfectly fine. I'm guessing the encrytpion makefile issue caused a bad build? (things were changed in libglusterfs). the makefile is committed now though (along with the -l fix). please do a make uninstall/clean/install since quit a chunk of changes have gone in the last few days. avati On Fri, Apr 06, 2007 at 03:33:30PM -0400, Brent A Nelson wrote: > glusterfsd dies on both nodes almost immediately (I can ls succesfully > once before it dies, but cd in and they're dead). The glusterfs processes > are still running, but I of course have "Transport endpoint is not > connected." > > Also, glusterfsd and glusterfs no longer seem to know where to log by > default and refuse to start unless I give the -l option on each. > > Thanks, > > Brent > > On Fri, 6 Apr 2007, Anand Avati wrote: > > >Brent, > > the fix has been committed. can you please check if it works for you? > > > > regards, > > avati > > > >On Thu, Apr 05, 2007 at 02:09:26AM -0400, Brent A Nelson wrote: > >>That's correct. I had commented out unify when narrowing down the mtime > >>bug (which turned out to be writebehind) and then decided I had no reason > >>to put it back in for this two-brick filesystem. It was mounted without > >>unify when this issue occurred. > >> > >>Thanks, > >> > >>Brent > >> > >>On Wed, 4 Apr 2007, Anand Avati wrote: > >> > >>>Can you confirm that you were NOT using unify int he setup?? > >>> > >>>regards, > >>>avati > >>> > >>> > >>>On Thu, Apr 05, 2007 at 01:09:16AM -0400, Brent A Nelson wrote: > >>>>Awesome! > >>>> > >>>>Thanks, > >>>> > >>>>Brent > >>>> > >>>>On Wed, 4 Apr 2007, Anand Avati wrote: > >>>> > >>>>>Brent, > >>>>>thank you so much for your efforts of sending the output! > >>>>>from the log it is clear the leak fd's are all for directories. Indeed > >>>>>there was an issue with releasedir() call reaching all the nodes. The > >>>>>fix should be committed today to tla. > >>>>> > >>>>>Thanks!! > >>>>> > >>>>>avati > >>>>> > >>>>> > >>>>> > >>>>>On Wed, Apr 04, 2007 at 09:18:48PM -0400, Brent A Nelson wrote: > >>>>>>I avoided restarting, as this issue would take a while to reproduce. > >>>>>> > >>>>>>jupiter01 and jupiter02 are mirrors of each other. All performance > >>>>>>translators are in use, except for writebehind (due to the mtime bug). > >>>>>> > >>>>>>jupiter01: > >>>>>>ls -l /proc/26466/fd |wc > >>>>>>65536 655408 7358168 > >>>>>>See attached for ls -l output. > >>>>>> > >>>>>>jupiter02: > >>>>>>ls -l /proc/3651/fd |wc > >>>>>>ls -l /proc/3651/fd > >>>>>>total 11 > >>>>>>lrwx------ 1 root root 64 2007-04-04 20:43 0 -> /dev/null > >>>>>>lrwx------ 1 root root 64 2007-04-04 20:43 1 -> /dev/null > >>>>>>lrwx------ 1 root root 64 2007-04-04 20:43 10 -> socket:[2565251] > >>>>>>lrwx------ 1 root root 64 2007-04-04 20:43 2 -> /dev/null > >>>>>>l-wx------ 1 root root 64 2007-04-04 20:43 3 -> > >>>>>>/var/log/glusterfs/glusterfsd.log > >>>>>>lrwx------ 1 root root 64 2007-04-04 20:43 4 -> socket:[2255275] > >>>>>>lrwx------ 1 root root 64 2007-04-04 20:43 5 -> socket:[2249710] > >>>>>>lr-x------ 1 root root 64 2007-04-04 20:43 6 -> eventpoll:[2249711] > >>>>>>lrwx------ 1 root root 64 2007-04-04 20:43 7 -> socket:[2255306] > >>>>>>lr-x------ 1 root root 64 2007-04-04 20:43 8 -> > >>>>>>/etc/glusterfs/glusterfs-client.vol > >>>>>>lr-x------ 1 root root 64 2007-04-04 20:43 9 -> > >>>>>>/etc/glusterfs/glusterfs-client.vol > >>>>>> > >>>>>>Note that it looks like all those extra directories listed on > >>>>>>jupiter01 > >>>>>>were locally rsynched from jupiter01's Lustre filesystems onto the > >>>>>>glusterfs client on jupiter01. A very large rsync from a different > >>>>>>machine to jupiter02 didn't go nuts. > >>>>>> > >>>>>>Thanks, > >>>>>> > >>>>>>Brent > >>>>>> > >>>>>>On Wed, 4 Apr 2007, Anand Avati wrote: > >>>>>> > >>>>>>>Brent, > >>>>>>>I hope the system is still in the same state to dig some info out. > >>>>>>>To verify that it is a file descriptor leak, can you please run this > >>>>>>>test. On the server, run ps ax and get the PID of glusterfsd. then do > >>>>>>>an ls -l on /proc/<pid>/fd/ and please mail the output of that. That > >>>>>>>should give a precise idea of what is happening. > >>>>>>>If the system has been reset out of the state, please give us the > >>>>>>>spec file you are using and the commands you ran (of some major jobs > >>>>>>>like heavy rsync) so that we will try to reproduce the error in our > >>>>>>>setup. > >>>>>>> > >>>>>>>regards, > >>>>>>>avati > >>>>>>> > >>>>>>> > >>>>>>>On Wed, Apr 04, 2007 at 01:12:33PM -0400, Brent A Nelson wrote: > >>>>>>>>I put a 2-node GlusterFS mirror into use internally yesterday, as > >>>>>>>>GlusterFS was looking pretty solid, and I rsynced a whole bunch of > >>>>>>>>stuff > >>>>>>>>to it. Today, however, an ls on any of the three clients gives me: > >>>>>>>> > >>>>>>>>ls: /backup: Too many open files > >>>>>>>> > >>>>>>>>It looks like glusterfsd hit a limit. Is this a bug > >>>>>>>>(glusterfs/glusterfsd > >>>>>>>>forgetting to close files; essentially, a file descriptor leak), or > >>>>>>>>do I > >>>>>>>>just need to increase the limit somewhere? > >>>>>>>> > >>>>>>>>Thanks, > >>>>>>>> > >>>>>>>>Brent > >>>>>>>> > >>>>>>>> > >>>>>>>>_______________________________________________ > >>>>>>>>Gluster-devel mailing list > >>>>>>>>Gluster-devel@xxxxxxxxxx > >>>>>>>>http://lists.nongnu.org/mailman/listinfo/gluster-devel > >>>>>>>> > >>>>>>> > >>>>>>>-- > >>>>>>>Shaw's Principle: > >>>>>>> Build a system that even a fool can use, > >>>>>>> and only a fool will want to use it. > >>>>>>> > >>>>> > >>>>> > >>>>> > >>>>>-- > >>>>>Shaw's Principle: > >>>>> Build a system that even a fool can use, > >>>>> and only a fool will want to use it. > >>>>> > >>>> > >>> > >>>-- > >>>Shaw's Principle: > >>> Build a system that even a fool can use, > >>> and only a fool will want to use it. > >>> > >> > > > >-- > >Shaw's Principle: > > Build a system that even a fool can use, > > and only a fool will want to use it. > > > -- Shaw's Principle: Build a system that even a fool can use, and only a fool will want to use it.