Harry- Thanks for the tip. My problem could well have been the same as yours. I have known for some time that "gluster peer status" doesn't give useful connection information but I didn't know about the "gluster volume status" commands; they must be new in version 3.3. I usually discover connection problems by seeing phrases like "disconnected" and "anomalies" in the logs. This has been happening more often since I upgraded to version 3.3, and I suspect it is being caused by the very high load experienced by some servers. I have seen this load problem discussed in other threads. The next time I attempt a rebalance operation I will run "gluster volume status all detail" first to check connectivity. -Dan On 08/08/2012 08:31 PM, Harry Mangalam wrote: > This sounds similar, tho not identical to a problem that I had > recently (descriibed here: > <http://gluster.org/pipermail/gluster-users/2012-August/011054.html> > My problems resulted were teh result of starting this kind of > rebalance with a server node appearing to be connected (via the > 'gluster peer status' output, but not actually being connected as > shown by the > 'gluster volume status all detail' output. Note especially the part > that describes its online state. > > ------------------------------------------------------------------------------ > Brick : Brick pbs3ib:/bducgl > Port : 24018 > Online : N <<===================== > Pid : 20953 > File System : xfs > > > You may have already verified this, but what I did was to start a > rebalance / fix-layout with a disconnected brick and it went ahead and > tried to do it, unsuccessfully as you might guess.. But when I > finally was able to reconnect the downed brick, and restart the > rebalance, it (astonishingly) was able to bring everything back. So > props to the gluster team. > > hjm > > > On Wed, Aug 8, 2012 at 11:58 AM, Dan Bretherton > <d.a.bretherton at reading.ac.uk <mailto:d.a.bretherton at reading.ac.uk>> > wrote: > > Hello All- > I have noticed another problem after upgrading to version 3.3. I > am unable to do "gluster volume rebalance <VOLUME> fix-layout > status" or "...fix-layout ... stop" after starting a rebalance > operation with "gluster volume rebalance <VOLUME> fix-layout > start". The fix-layout operation seemed to be progressing > normally on all the servers according to the log files, but all > attempts to do "status" or "stop" result in the CLI usage message > being returned. The only reference to the rebalance commands in > the log files were these, which all the servers seem to have one > or more of. > > [root at romulus glusterfs]# grep rebalance *.log > etc-glusterfs-glusterd.vol.log:[2012-08-08 12:49:04.870709] W > [socket.c:1512:__socket_proto_state_machine] 0-management: reading > from socket failed. Error (Transport endpoint is not connected), > peer > (/var/lib/glusterd/vols/tracks/rebalance/cb21050d-05c2-42b3-8660-230954bab324.sock) > tracks-rebalance.log:[2012-08-06 10:41:18.550241] I > [graph.c:241:gf_add_cmdline_options] 0-tracks-dht: adding option > 'rebalance-cmd' for volume 'tracks-dht' with value '4' > > The volume name is "tracks" by the way. I wanted to stop the > rebalance operation because it seemed to be causing a very high > load on some of the servers had been running for several days. I > ended up having to manually kill the rebalance processes on all > the servers followed by restarting glusterd. > > After that I found that one of the servers had > "rebalance_status=4" in file > /var/lib/glusterd/vols/tracks/node_state.info > <http://node_state.info>, whereas all the others had > "rebalance_status=0". I manually changed the '4' to '0' and > restarted glusterd. I don't know if this was a consequence of the > way I had killed the rebalance operation or the cause of the > strange behaviour. I don't really want to start another rebalance > going to test because the last one was so disruptive. > > Has anyone else experienced this problem since upgrading to 3.3? > > Regards, > Dan. > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > > > > > -- > Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine > [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 > 415 South Circle View Dr, Irvine, CA, 92697 [shipping] > MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://gluster.org/pipermail/gluster-users/attachments/20120813/83e41133/attachment-0001.htm>