On Tue, Nov 14, 2017 at 2:47 PM, Emmanuel Dreyfus <manu@xxxxxxxxxx> wrote:
On Tue, Nov 14, 2017 at 12:17:05PM +0530, Atin Mukherjee wrote:
> > gluster volume status also exhibits trouble: each server will only
> > list its bricks, but not the other's one. I suspect it could just
> > be some tiemout because of slow answer from the peer.
> Have you checked the output of gluster peer status? Also does glusterd log
> file give any hint on time outs, rpc failures, disconnections et all?
gluster peer status says "State: Sent and Received peer request (Connected)"
on both sides.
So this is the origin of why the peers don't understand they are connected. Friend handshaking got stuck in the middle and it never recovered back. Restarting the glusterd services ideally should fix the state, if not then you'd have to manually edit the /var/lib/glusterd/peers/UUID files with state=3 and then restart glusterd service.
I have this in glusterd.log:
[2017-11-14 08:49:47.289423] I [MSGID: 106143] [glusterd-pmap.c:279:pmap_registry_bind] 0-pmap: adding brick /export/wd3e on port 49155
[2017-11-14 08:49:52.289926] I [MSGID: 106143] [glusterd-pmap.c:279:pmap_registry_bind] 0-pmap: adding brick /export/wd0e on port 49152
[2017-11-14 08:49:52.295394] I [MSGID: 106143] [glusterd-pmap.c:279:pmap_registry_bind] 0-pmap: adding brick /export/wd1e on port 49153
[2017-11-14 08:49:52.302973] I [MSGID: 106143] [glusterd-pmap.c:279:pmap_registry_bind] 0-pmap: adding brick /export/wd2e on port 49154
[2017-11-14 08:54:31.535066] W [socket.c:593:__socket_rwv] 0-management: readv on 192.0.2.109:24007 failed (Connection reset by peer)
[2017-11-14 08:54:32.567745] I [MSGID: 106004] [glusterd-handler.c:6284:__glusterd_peer_rpc_notify] 0-management: Peer <bidon> (<2d7719d9-0466-434c-a881- 4081156fac47>), in state <Probe Sent to Peer>, has disconnected from glusterd.
An odd thing: the registrations message suggest the local bricks should
show as online in glusterfs volume status output. They are displayed as
offline, until I kill the glusterfsd processes and issue a
gluster volume start gfs force.
ALong with symetrical stuff, the peer has this;
[2017-11-14 08:56:05.799686] E [socket.c:2369:socket_connect_finish] 0-management: connection to 192.0.2.110:24007 failed (Connection timed out); disconnecting socket
In the meantime I tracked the performance problem to exteded atributes
system calls. The root of the problem is outside of glusterfs, but fixing
the consequuences would be nice.
--
Emmanuel Dreyfus
manu@xxxxxxxxxx
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-devel