Pranith can you send the client and bricks logs. Thanks, Susant~ ----- Original Message ----- From: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx> To: "Franco Broi" <franco.broi@xxxxxxxxxx> Cc: gluster-users@xxxxxxxxxxx, "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>, spalai@xxxxxxxxxx, kdhananj@xxxxxxxxxx, vsomyaju@xxxxxxxxxx, nbalacha@xxxxxxxxxx Sent: Wednesday, 4 June, 2014 7:53:41 AM Subject: Re: glusterfsd process spinning hi Franco, CC Devs who work on DHT to comment. Pranith On 06/04/2014 07:39 AM, Franco Broi wrote: > On Wed, 2014-06-04 at 07:28 +0530, Pranith Kumar Karampuri wrote: >> Franco, >> Thanks for providing the logs. I just copied over the logs to my >> machine. Most of the logs I see are related to "No such File or >> Directory" I wonder what lead to this. Do you have any idea? > No but I'm just looking at my 3.5 Gluster volume and it has a directory > that looks empty but can't be deleted. When I look at the directories on > the servers there are definitely files in there. > > [franco@charlie1 franco]$ rmdir /data2/franco/dir1226/dir25 > rmdir: failed to remove `/data2/franco/dir1226/dir25': Directory not empty > [franco@charlie1 franco]$ ls -la /data2/franco/dir1226/dir25 > total 8 > drwxrwxr-x 2 franco support 60 May 21 03:58 . > drwxrwxr-x 3 franco support 24 Jun 4 09:37 .. > > [root@nas6 ~]# ls -la /data*/gvol/franco/dir1226/dir25 > /data21/gvol/franco/dir1226/dir25: > total 2081 > drwxrwxr-x 13 1348 200 13 May 21 03:58 . > drwxrwxr-x 3 1348 200 3 May 21 03:58 .. > drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13017 > drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13018 > drwxrwxr-x 2 1348 200 3 May 16 12:05 dir13020 > drwxrwxr-x 2 1348 200 3 May 16 12:05 dir13021 > drwxrwxr-x 2 1348 200 3 May 16 12:05 dir13022 > drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13024 > drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13027 > drwxrwxr-x 2 1348 200 3 May 16 12:05 dir13028 > drwxrwxr-x 2 1348 200 2 May 16 12:06 dir13029 > drwxrwxr-x 2 1348 200 2 May 16 12:06 dir13031 > drwxrwxr-x 2 1348 200 3 May 16 12:06 dir13032 > > /data22/gvol/franco/dir1226/dir25: > total 2084 > drwxrwxr-x 13 1348 200 13 May 21 03:58 . > drwxrwxr-x 3 1348 200 3 May 21 03:58 .. > drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13017 > drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13018 > drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13020 > drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13021 > drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13022 > ..... > > Maybe Gluster is losing track of the files?? > >> Pranith >> >> On 06/02/2014 02:48 PM, Franco Broi wrote: >>> Hi Pranith >>> >>> Here's a listing of the brick logs, looks very odd especially the size >>> of the log for data10. >>> >>> [root@nas3 bricks]# ls -ltrh >>> total 2.6G >>> -rw------- 1 root root 381K May 13 12:15 data12-gvol.log-20140511 >>> -rw------- 1 root root 430M May 13 12:15 data11-gvol.log-20140511 >>> -rw------- 1 root root 328K May 13 12:15 data9-gvol.log-20140511 >>> -rw------- 1 root root 2.0M May 13 12:15 data10-gvol.log-20140511 >>> -rw------- 1 root root 0 May 18 03:43 data10-gvol.log-20140525 >>> -rw------- 1 root root 0 May 18 03:43 data11-gvol.log-20140525 >>> -rw------- 1 root root 0 May 18 03:43 data12-gvol.log-20140525 >>> -rw------- 1 root root 0 May 18 03:43 data9-gvol.log-20140525 >>> -rw------- 1 root root 0 May 25 03:19 data10-gvol.log-20140601 >>> -rw------- 1 root root 0 May 25 03:19 data11-gvol.log-20140601 >>> -rw------- 1 root root 0 May 25 03:19 data9-gvol.log-20140601 >>> -rw------- 1 root root 98M May 26 03:04 data12-gvol.log-20140518 >>> -rw------- 1 root root 0 Jun 1 03:37 data10-gvol.log >>> -rw------- 1 root root 0 Jun 1 03:37 data11-gvol.log >>> -rw------- 1 root root 0 Jun 1 03:37 data12-gvol.log >>> -rw------- 1 root root 0 Jun 1 03:37 data9-gvol.log >>> -rw------- 1 root root 1.8G Jun 2 16:35 data10-gvol.log-20140518 >>> -rw------- 1 root root 279M Jun 2 16:35 data9-gvol.log-20140518 >>> -rw------- 1 root root 328K Jun 2 16:35 data12-gvol.log-20140601 >>> -rw------- 1 root root 8.3M Jun 2 16:35 data11-gvol.log-20140518 >>> >>> Too big to post everything. >>> >>> Cheers, >>> >>> On Sun, 2014-06-01 at 22:00 -0400, Pranith Kumar Karampuri wrote: >>>> ----- Original Message ----- >>>>> From: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx> >>>>> To: "Franco Broi" <franco.broi@xxxxxxxxxx> >>>>> Cc: gluster-users@xxxxxxxxxxx >>>>> Sent: Monday, June 2, 2014 7:01:34 AM >>>>> Subject: Re: glusterfsd process spinning >>>>> >>>>> >>>>> >>>>> ----- Original Message ----- >>>>>> From: "Franco Broi" <franco.broi@xxxxxxxxxx> >>>>>> To: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx> >>>>>> Cc: gluster-users@xxxxxxxxxxx >>>>>> Sent: Sunday, June 1, 2014 10:53:51 AM >>>>>> Subject: Re: glusterfsd process spinning >>>>>> >>>>>> >>>>>> The volume is almost completely idle now and the CPU for the brick >>>>>> process has returned to normal. I've included the profile and I think it >>>>>> shows the latency for the bad brick (data12) is unusually high, probably >>>>>> indicating the filesystem is at fault after all?? >>>>> I am not sure if we can believe the outputs now that you say the brick >>>>> returned to normal. Next time it is acting up, do the same procedure and >>>>> post the result. >>>> On second thought may be its not a bad idea to inspect the log files of the bricks in nas3. Could you post them. >>>> >>>> Pranith >>>> >>>>> Pranith >>>>>> On Sun, 2014-06-01 at 01:01 -0400, Pranith Kumar Karampuri wrote: >>>>>>> Franco, >>>>>>> Could you do the following to get more information: >>>>>>> >>>>>>> "gluster volume profile <volname> start" >>>>>>> >>>>>>> Wait for some time, this will start gathering what operations are coming >>>>>>> to >>>>>>> all the bricks" >>>>>>> Now execute "gluster volume profile <volname> info" > >>>>>>> /file/you/should/reply/to/this/mail/with >>>>>>> >>>>>>> Then execute: >>>>>>> gluster volume profile <volname> stop >>>>>>> >>>>>>> Lets see if this throws any light on the problem at hand >>>>>>> >>>>>>> Pranith >>>>>>> ----- Original Message ----- >>>>>>>> From: "Franco Broi" <franco.broi@xxxxxxxxxx> >>>>>>>> To: gluster-users@xxxxxxxxxxx >>>>>>>> Sent: Sunday, June 1, 2014 9:02:48 AM >>>>>>>> Subject: glusterfsd process spinning >>>>>>>> >>>>>>>> Hi >>>>>>>> >>>>>>>> I've been suffering from continual problems with my gluster filesystem >>>>>>>> slowing down due to what I thought was congestion on a single brick >>>>>>>> being caused by a problem with the underlying filesystem running slow >>>>>>>> but I've just noticed that the glusterfsd process for that particular >>>>>>>> brick is running at 100%+, even when the filesystem is almost idle. >>>>>>>> >>>>>>>> I've done a couple of straces of the brick and another on the same >>>>>>>> server, does the high number of futex errors give any clues as to what >>>>>>>> might be wrong? >>>>>>>> >>>>>>>> % time seconds usecs/call calls errors syscall >>>>>>>> ------ ----------- ----------- --------- --------- ---------------- >>>>>>>> 45.58 0.027554 0 191665 20772 futex >>>>>>>> 28.26 0.017084 0 137133 readv >>>>>>>> 26.04 0.015743 0 66259 epoll_wait >>>>>>>> 0.13 0.000077 3 23 writev >>>>>>>> 0.00 0.000000 0 1 epoll_ctl >>>>>>>> ------ ----------- ----------- --------- --------- ---------------- >>>>>>>> 100.00 0.060458 395081 20772 total >>>>>>>> >>>>>>>> % time seconds usecs/call calls errors syscall >>>>>>>> ------ ----------- ----------- --------- --------- ---------------- >>>>>>>> 99.25 0.334020 133 2516 epoll_wait >>>>>>>> 0.40 0.001347 0 4090 26 futex >>>>>>>> 0.35 0.001192 0 5064 readv >>>>>>>> 0.00 0.000000 0 20 writev >>>>>>>> ------ ----------- ----------- --------- --------- ---------------- >>>>>>>> 100.00 0.336559 11690 26 total >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users@xxxxxxxxxxx >>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users >>>>>>>> > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users