Looks like you have a problem getting to one of your servers: [2014-02-03 21:03:06.231215] E [socket.c:2157:socket_connect_finish] 0-bigdata-client-0: connection to x.x.x.x:49153 failed (No route to host) On Mon, 2014-02-03 at 16:15 -0600, Branden Timm wrote: > I should mention that the following line from the log is also worrying, > as each trusted server is running Gluster v. 3.4.2, as verified by > running /usr/sbin/glusterd -V: > > Using Program GlusterFS 3.3, Num (1298437), Version (330) > > Branden > > On 2/3/2014 3:35 PM, Branden Timm wrote: > > Hello, > > I'm experiencing some major problems with my GlusterFS filesystem > > after an upgrade/expansion, and I'm hoping I can get pointed in the > > right direction for troubleshooting it. > > > > I had a 5 server, 5 brick distributed volume on 3.3.1. I brought the > > volume offline, stopped glusterd and glusterfsd on all servers, then > > upgraded to 3.4.2 and brought glusterd and glusterfsd back online. So > > far so good. > > > > Once the volume was back online and healthy, I added a new server to > > the trusted storage pool and added two bricks attached to that server > > to the pool. Everything looked fine so far, gluster volume status > > showed all six servers and seven bricks as online. > > > > The problem came next when I tried to rebalance. I ran "gluster > > volume rebalance <volname> start force", then once it returned ran > > "status" and saw that the rebalance failed on all but one node, which > > showed in progress. The node that it was running successfully on was > > a pre-existing server, not the new server/brick(s). The other five > > servers report "1 subvolume(s) are down. Skipping fix layout." > > Somebody in the IRC channel suggested this means that one of my bricks > > are down, but "gluster volume <volname> status" reports all servers > > and bricks as being online. Full pastebin of the rebalance log > > (essentially the same on all five failing servers) here: > > http://fpaste.org/74082/14615971/ > > > > Currently, I have both missing files and files that report "Transport > > endopint not connected" when they are accessed. It seems to really be > > related to the rebalance failures, and the layout seems incorrect as > > well. Really hoping somebody can point me in the right direction of > > where to look next. Thanks in advance for any help. > > > > -Branden > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users@xxxxxxxxxxx > > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://supercolony.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users