Re: glusterfsd process spinning

Franco Broi <franco.broi@xxxxxxxxxx> · Wed, 04 Jun 2014 10:09:04 +0800

On Wed, 2014-06-04 at 07:28 +0530, Pranith Kumar Karampuri wrote: 
> Franco,
>        Thanks for providing the logs. I just copied over the logs to my 
> machine. Most of the logs I see are related to "No such File or 
> Directory" I wonder what lead to this. Do you have any idea?

No but I'm just looking at my 3.5 Gluster volume and it has a directory
that looks empty but can't be deleted. When I look at the directories on
the servers there are definitely files in there.

[franco@charlie1 franco]$ rmdir /data2/franco/dir1226/dir25
rmdir: failed to remove `/data2/franco/dir1226/dir25': Directory not empty
[franco@charlie1 franco]$ ls -la  /data2/franco/dir1226/dir25
total 8
drwxrwxr-x 2 franco support 60 May 21 03:58 .
drwxrwxr-x 3 franco support 24 Jun  4 09:37 ..

[root@nas6 ~]# ls -la /data*/gvol/franco/dir1226/dir25
/data21/gvol/franco/dir1226/dir25:
total 2081
drwxrwxr-x 13 1348 200 13 May 21 03:58 .
drwxrwxr-x  3 1348 200  3 May 21 03:58 ..
drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13017
drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13018
drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13020
drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13021
drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13022
drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13024
drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13027
drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13028
drwxrwxr-x  2 1348 200  2 May 16 12:06 dir13029
drwxrwxr-x  2 1348 200  2 May 16 12:06 dir13031
drwxrwxr-x  2 1348 200  3 May 16 12:06 dir13032

/data22/gvol/franco/dir1226/dir25:
total 2084
drwxrwxr-x 13 1348 200 13 May 21 03:58 .
drwxrwxr-x  3 1348 200  3 May 21 03:58 ..
drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13017
drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13018
drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13020
drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13021
drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13022
.....

Maybe Gluster is losing track of the files??

> 
> Pranith
> 
> On 06/02/2014 02:48 PM, Franco Broi wrote:
> > Hi Pranith
> >
> > Here's a listing of the brick logs, looks very odd especially the size
> > of the log for data10.
> >
> > [root@nas3 bricks]# ls -ltrh
> > total 2.6G
> > -rw------- 1 root root 381K May 13 12:15 data12-gvol.log-20140511
> > -rw------- 1 root root 430M May 13 12:15 data11-gvol.log-20140511
> > -rw------- 1 root root 328K May 13 12:15 data9-gvol.log-20140511
> > -rw------- 1 root root 2.0M May 13 12:15 data10-gvol.log-20140511
> > -rw------- 1 root root    0 May 18 03:43 data10-gvol.log-20140525
> > -rw------- 1 root root    0 May 18 03:43 data11-gvol.log-20140525
> > -rw------- 1 root root    0 May 18 03:43 data12-gvol.log-20140525
> > -rw------- 1 root root    0 May 18 03:43 data9-gvol.log-20140525
> > -rw------- 1 root root    0 May 25 03:19 data10-gvol.log-20140601
> > -rw------- 1 root root    0 May 25 03:19 data11-gvol.log-20140601
> > -rw------- 1 root root    0 May 25 03:19 data9-gvol.log-20140601
> > -rw------- 1 root root  98M May 26 03:04 data12-gvol.log-20140518
> > -rw------- 1 root root    0 Jun  1 03:37 data10-gvol.log
> > -rw------- 1 root root    0 Jun  1 03:37 data11-gvol.log
> > -rw------- 1 root root    0 Jun  1 03:37 data12-gvol.log
> > -rw------- 1 root root    0 Jun  1 03:37 data9-gvol.log
> > -rw------- 1 root root 1.8G Jun  2 16:35 data10-gvol.log-20140518
> > -rw------- 1 root root 279M Jun  2 16:35 data9-gvol.log-20140518
> > -rw------- 1 root root 328K Jun  2 16:35 data12-gvol.log-20140601
> > -rw------- 1 root root 8.3M Jun  2 16:35 data11-gvol.log-20140518
> >
> > Too big to post everything.
> >
> > Cheers,
> >
> > On Sun, 2014-06-01 at 22:00 -0400, Pranith Kumar Karampuri wrote:
> >> ----- Original Message -----
> >>> From: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>
> >>> To: "Franco Broi" <franco.broi@xxxxxxxxxx>
> >>> Cc: gluster-users@xxxxxxxxxxx
> >>> Sent: Monday, June 2, 2014 7:01:34 AM
> >>> Subject: Re:  glusterfsd process spinning
> >>>
> >>>
> >>>
> >>> ----- Original Message -----
> >>>> From: "Franco Broi" <franco.broi@xxxxxxxxxx>
> >>>> To: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>
> >>>> Cc: gluster-users@xxxxxxxxxxx
> >>>> Sent: Sunday, June 1, 2014 10:53:51 AM
> >>>> Subject: Re:  glusterfsd process spinning
> >>>>
> >>>>
> >>>> The volume is almost completely idle now and the CPU for the brick
> >>>> process has returned to normal. I've included the profile and I think it
> >>>> shows the latency for the bad brick (data12) is unusually high, probably
> >>>> indicating the filesystem is at fault after all??
> >>> I am not sure if we can believe the outputs now that you say the brick
> >>> returned to normal. Next time it is acting up, do the same procedure and
> >>> post the result.
> >> On second thought may be its not a bad idea to inspect the log files of the bricks in nas3. Could you post them.
> >>
> >> Pranith
> >>
> >>> Pranith
> >>>> On Sun, 2014-06-01 at 01:01 -0400, Pranith Kumar Karampuri wrote:
> >>>>> Franco,
> >>>>>      Could you do the following to get more information:
> >>>>>
> >>>>> "gluster volume profile <volname> start"
> >>>>>
> >>>>> Wait for some time, this will start gathering what operations are coming
> >>>>> to
> >>>>> all the bricks"
> >>>>> Now execute "gluster volume profile <volname> info" >
> >>>>> /file/you/should/reply/to/this/mail/with
> >>>>>
> >>>>> Then execute:
> >>>>> gluster volume profile <volname> stop
> >>>>>
> >>>>> Lets see if this throws any light on the problem at hand
> >>>>>
> >>>>> Pranith
> >>>>> ----- Original Message -----
> >>>>>> From: "Franco Broi" <franco.broi@xxxxxxxxxx>
> >>>>>> To: gluster-users@xxxxxxxxxxx
> >>>>>> Sent: Sunday, June 1, 2014 9:02:48 AM
> >>>>>> Subject:  glusterfsd process spinning
> >>>>>>
> >>>>>> Hi
> >>>>>>
> >>>>>> I've been suffering from continual problems with my gluster filesystem
> >>>>>> slowing down due to what I thought was congestion on a single brick
> >>>>>> being caused by a problem with the underlying filesystem running slow
> >>>>>> but I've just noticed that the glusterfsd process for that particular
> >>>>>> brick is running at 100%+, even when the filesystem is almost idle.
> >>>>>>
> >>>>>> I've done a couple of straces of the brick and another on the same
> >>>>>> server, does the high number of futex errors give any clues as to what
> >>>>>> might be wrong?
> >>>>>>
> >>>>>> % time     seconds  usecs/call     calls    errors syscall
> >>>>>> ------ ----------- ----------- --------- --------- ----------------
> >>>>>> 45.58    0.027554           0    191665     20772 futex
> >>>>>> 28.26    0.017084           0    137133           readv
> >>>>>> 26.04    0.015743           0     66259           epoll_wait
> >>>>>>    0.13    0.000077           3        23           writev
> >>>>>>    0.00    0.000000           0         1           epoll_ctl
> >>>>>> ------ ----------- ----------- --------- --------- ----------------
> >>>>>> 100.00    0.060458                395081     20772 total
> >>>>>>
> >>>>>> % time     seconds  usecs/call     calls    errors syscall
> >>>>>> ------ ----------- ----------- --------- --------- ----------------
> >>>>>> 99.25    0.334020         133      2516           epoll_wait
> >>>>>>    0.40    0.001347           0      4090        26 futex
> >>>>>>    0.35    0.001192           0      5064           readv
> >>>>>>    0.00    0.000000           0        20           writev
> >>>>>> ------ ----------- ----------- --------- --------- ----------------
> >>>>>> 100.00    0.336559                 11690        26 total
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Gluster-users mailing list
> >>>>>> Gluster-users@xxxxxxxxxxx
> >>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> >>>>>>
> >>>>
> >
> 

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users