Hi Tom, Which version of Gluster are you running? I talked with my operations team, and they don't seem to recall a log entry afr_dir_exclusive_crawl. But AFR seems like a self-heal. Therefore I suspect you're using Gluster in a very similar way that we do, which means a lot of file entries in a single folder. In our case, we have several million files in our Gluster cluster, and when a self-heal hits, we can kiss our Gluster goodbye for a couple of hours. We had to disable the self-heal mechanisms on our clusters to prevent that, which provoked an increased chance of split-brain files on our clusters. We then developed a daemon to heal these split-brains ourselves. -----Original Message----- From: Tom van Leeuwen [mailto:tom.van.leeuwen@xxxxxxxxxxxxx] Sent: 6 octobre 2014 03:00 To: Jocelyn Hotte; gluster-users@xxxxxxxxxxx Subject: Re: 100% CPU WAIT Hi Jocelyn, Thanks for your response. I noticed this 100% CPU WAIT on server01 and decided to reboot it. After booting I noticed these two messages: glustershd.log:[2014-10-03 05:05:46.969650] I [afr-self-heald.c:1180:afr_dir_exclusive_crawl] 0-myvol-replicate-0: Another crawl is in progress for myvol-client-0 glustershd.log:[2014-10-03 05:05:46.970111] I [afr-self-heald.c:1180:afr_dir_exclusive_crawl] 0-myvol2-replicate-0: Another crawl is in progress for myvol2-client-0 It is 2014-10-06 now. myvol is 493G in use and myvol2 is 4G in use. The 100% CPU WAIT is still there. I have no idea where it comes from and I have no idea if the afr-self-heald is still running. What triggered me here, is that I got a complaint that the initial performance was ~75MB/s throughput on a large write (time dd if=/dev/zero of=2G.bin bs=1M count=2048) and now is ~40MB/s [root@server01 glusterfs]# iostat -k 10 /dev/xvdb # The myvol disk Linux 2.6.32-358.6.2.el6.x86_64 (server01) 06-10-14 _x86_64_ (2 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0,49 0,00 0,89 8,45 0,03 90,14 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvdb 19,25 67,80 89,64 18013783 23816408 avg-cpu: %user %nice %system %iowait %steal %idle 0,35 0,00 1,36 48,74 0,05 49,50 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvdb 70,70 282,80 0,00 2828 0 avg-cpu: %user %nice %system %iowait %steal %idle 0,35 0,00 0,90 48,95 0,00 49,80 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvdb 67,90 271,60 0,00 2716 0 avg-cpu: %user %nice %system %iowait %steal %idle 0,25 0,00 1,15 48,69 0,00 49,90 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvdb 67,20 268,80 0,00 2688 0 I have no idea what it is doing or how to proceed with this issue. On 03-10-14 16:58, Jocelyn Hotte wrote: > Hi Tom, > We experience this behavior when a self heal is running after a bad communication between 2 nodes, or after a node crashed. > > How we diagnose it is usually by looking into the mount log (tail -f > /var/log/gluster/mnt-log), and you should see entries such as afr ... > self-heal > > -----Original Message----- > From: gluster-users-bounces@xxxxxxxxxxx > [mailto:gluster-users-bounces@xxxxxxxxxxx] On Behalf Of Tom van > Leeuwen > Sent: 3 octobre 2014 06:00 > To: gluster-users@xxxxxxxxxxx > Subject: 100% CPU WAIT > > Hi guys, > > My glusterfs is causing 100% CPU WAIT according to `top`. > This has been going on for hours and I have no idea what is causing it. > How can I troubleshoot? > > Iotop reports this: > Total DISK READ: 268.60 K/s | Total DISK WRITE: 0.00 B/s > TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND > 7899 be/4 root 268.60 K/s 0.00 B/s 0.00 % 96.70 % glusterfsd > -s server01 --volfile-id myvol.server01.glusterfs-brick1 -p > /var/lib/glusterd/vols/myvol/run/server01-glusterfs-brick1.pid -S > /var/run/a7562806405853d2b9382d6fc59051cc.socket --brick-name > /glusterfs/brick1 -l /var/log/glusterfs/bricks/glusterfs-brick1.log > --xlator-option > *-posix.glusterd-uuid=07acd5b2-85e6-46f1-8477-038028e8ef7f > --brick-port > 49152 --xlator-option myvol-server.listen-port=49152 > 1885 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.98 % glusterfsd > -s server01 --volfile-id myvol.server01.glusterfs-brick1 -p > /var/lib/glusterd/vols/myvol/run/server01-glusterfs-brick1.pid -S > /var/run/a7562806405853d2b9382d6fc59051cc.socket --brick-name > /glusterfs/brick1 -l /var/log/glusterfs/bricks/glusterfs-brick1.log > --xlator-option > *-posix.glusterd-uuid=07acd5b2-85e6-46f1-8477-038028e8ef7f > --brick-port > 49152 --xlator-option myvol-server.listen-port=49152 > > Kind regards, > Tom van Leeuwen > > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://supercolony.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://supercolony.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users