Re: Too many open files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Brent,
  thank you so much for your efforts of sending the output! 
from the log it is clear the leak fd's are all for directories. Indeed
there was an issue with releasedir() call reaching all the nodes. The
fix should be committed today to tla.

Thanks!!

avati



On Wed, Apr 04, 2007 at 09:18:48PM -0400, Brent A Nelson wrote:
> I avoided restarting, as this issue would take a while to reproduce.
> 
> jupiter01 and jupiter02 are mirrors of each other.  All performance 
> translators are in use, except for writebehind (due to the mtime bug).
> 
> jupiter01:
> ls -l /proc/26466/fd |wc
>   65536  655408 7358168
> See attached for ls -l output.
> 
> jupiter02:
> ls -l /proc/3651/fd |wc
> ls -l /proc/3651/fd
> total 11
> lrwx------ 1 root root 64 2007-04-04 20:43 0 -> /dev/null
> lrwx------ 1 root root 64 2007-04-04 20:43 1 -> /dev/null
> lrwx------ 1 root root 64 2007-04-04 20:43 10 -> socket:[2565251]
> lrwx------ 1 root root 64 2007-04-04 20:43 2 -> /dev/null
> l-wx------ 1 root root 64 2007-04-04 20:43 3 -> 
> /var/log/glusterfs/glusterfsd.log
> lrwx------ 1 root root 64 2007-04-04 20:43 4 -> socket:[2255275]
> lrwx------ 1 root root 64 2007-04-04 20:43 5 -> socket:[2249710]
> lr-x------ 1 root root 64 2007-04-04 20:43 6 -> eventpoll:[2249711]
> lrwx------ 1 root root 64 2007-04-04 20:43 7 -> socket:[2255306]
> lr-x------ 1 root root 64 2007-04-04 20:43 8 -> 
> /etc/glusterfs/glusterfs-client.vol
> lr-x------ 1 root root 64 2007-04-04 20:43 9 -> 
> /etc/glusterfs/glusterfs-client.vol
> 
> Note that it looks like all those extra directories listed on jupiter01 
> were locally rsynched from jupiter01's Lustre filesystems onto the 
> glusterfs client on jupiter01.  A very large rsync from a different 
> machine to jupiter02 didn't go nuts.
> 
> Thanks,
> 
> Brent
> 
> On Wed, 4 Apr 2007, Anand Avati wrote:
> 
> >Brent,
> > I hope the system is still in the same state to dig some info out.
> >To verify that it is a file descriptor leak, can you please run this
> >test. On the server, run ps ax and get the PID of glusterfsd. then do
> >an ls -l on /proc/<pid>/fd/ and please mail the output of that. That
> >should give a precise idea of what is happening.
> > If the system has been reset out of the state, please give us the
> >spec file you are using and the commands you ran (of some major jobs
> >like heavy rsync) so that we will try to reproduce the error in our
> >setup.
> >
> >regards,
> >avati
> >
> >
> >On Wed, Apr 04, 2007 at 01:12:33PM -0400, Brent A Nelson wrote:
> >>I put a 2-node GlusterFS mirror into use internally yesterday, as
> >>GlusterFS was looking pretty solid, and I rsynced a whole bunch of stuff
> >>to it.  Today, however, an ls on any of the three clients gives me:
> >>
> >>ls: /backup: Too many open files
> >>
> >>It looks like glusterfsd hit a limit.  Is this a bug (glusterfs/glusterfsd
> >>forgetting to close files; essentially, a file descriptor leak), or do I
> >>just need to increase the limit somewhere?
> >>
> >>Thanks,
> >>
> >>Brent
> >>
> >>
> >>_______________________________________________
> >>Gluster-devel mailing list
> >>Gluster-devel@xxxxxxxxxx
> >>http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >>
> >
> >-- 
> >Shaw's Principle:
> >       Build a system that even a fool can use,
> >       and only a fool will want to use it.
> >



-- 
Shaw's Principle:
        Build a system that even a fool can use,
        and only a fool will want to use it.




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux