My guess is there is a corruption in vol list or peer list which has lead glusterd to get into a infinite loop of traversing a peer/volume list and CPU to hog up. Again this is a guess and I've not got a chance to take a detail look at the logs and the strace output.
I believe if you get to reboot the node again the problem will disappear.
On Tue, 22 Aug 2017 at 20:07, Serkan Çoban <cobanserkan@xxxxxxxxx> wrote:
As an addition perf top shows %80 libc-2.12.so __strcmp_sse42 during
glusterd %100 cpu usage
Hope this helps...
On Tue, Aug 22, 2017 at 2:41 PM, Serkan Çoban <cobanserkan@xxxxxxxxx> wrote:
> Hi there,
>
> I have a strange problem.
> Gluster version in 3.10.5, I am testing new servers. Gluster
> configuration is 16+4 EC, I have three volumes, each have 1600 bricks.
> I can successfully create the cluster and volumes without any
> problems. I write data to cluster from 100 clients for 12 hours again
> no problem. But when I try to reboot a node, glusterd process hangs on
> %100 CPU usage and seems to do nothing, no brick processes come
> online. You can find strace of glusterd process for 1 minutes here:
>
> https://www.dropbox.com/s/c7bxfnbqxze1yus/gluster_strace.out?dl=0
>
> Here is the glusterd logs:
> https://www.dropbox.com/s/hkstb3mdeil9a5u/glusterd.log?dl=0
>
>
> By the way, reboot of one server completes without problem if I reboot
> the servers before creating any volumes.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users
--
- Atin (atinm)
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users