Hi guys,
I've been running GlusterFS for a couple of days and it's been nice and steady, except a minor problem: the peer probing on my relatively large cluster seems to stuck for a long time.
Last time atinm told me in IRC (I was barius.2333 in IRC) that a cluster as large as 50+ nodes might take a long time peer probing (o(n^2) time), and now my cluster has expanded to 90+ nodes.
The peer probing process was started 4 days ago, when my cluster had ~50 nodes. I probed ~40 nodes using subprocess in bash at once, and the commands all successfully returned almost immediately (no time-outs).
However the glusterd kept writing to /var/lib/glusterd/peers/ during the last 4 days, and all commands related to newly-added nodes, e.g. add-brick, mount, will time-out and fail. Also, running “gluster peer status” on my nodes shows “Disconnected” nodes that varies over time.
What shall I do in such situation? Do I need to wait for the whole peer probing progress to complete, or can I simply kill the glusterd and restart it?
Regards,
Yiping Peng
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users