Help, peer probe seems to get stuck on large cluster.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi guys,


I've been running GlusterFS for a couple of days and it's been nice and steady, except a minor problem: the peer probing on my relatively large cluster seems to stuck for a long time.


Last time atinm told me in IRC (I was barius.2333 in IRC) that a cluster as large as 50+ nodes might take a long time peer probing (o(n^2) time), and now my cluster has expanded to 90+ nodes.


The peer probing process was started 4 days ago, when my cluster had ~50 nodes. I probed ~40 nodes using subprocess in bash at once, and the commands all successfully returned almost immediately (no time-outs).


However the glusterd kept writing to /var/lib/glusterd/peers/ during the last 4 days, and all commands related to newly-added nodes, e.g. add-brick, mount, will time-out and fail. Also, running “gluster peer status” on my nodes shows “Disconnected” nodes that varies over time.


What shall I do in such situation? Do I need to wait for the whole peer probing progress to complete, or can I simply kill the glusterd and restart it?


Regards,

Yiping Peng

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux