Re: Mount Fails when 1 of 2 Replicas is Down (GlusterFS 3.7.2)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



10.1.0.100 is the IP of the replica server that is down. However this log is from the replica server that is up, there's only 2 servers and they are both replicas for the volume. It shows up when attempting to mount the volume from a client, it seems the server that's up is trying to contact the server that's down and things are failing?

I also noticed in the glusterd log the following continuous errors when the other node is down, is this normal?

[2015-07-02 06:16:18.028223] W [glusterd-locks.c:653:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x199)[0x7f1d94a9bd59] (--> /usr/lib64/glusterfs/3.7.2/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x47a)[0x7f1d8fa30efa] (--> /usr/lib64/glusterfs/3.7.2/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x2a2)[0x7f1d8f9abda2] (--> /usr/lib64/glusterfs/3.7.2/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f1d8f9a3700] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a8)[0x7f1d9486c458] ))))) 0-management: Lock for vol test not held


On Wed, Jul 1, 2015 at 5:03 PM, Vijay Bellur <vbellur@xxxxxxxxxx> wrote:
On Tuesday 30 June 2015 10:56 PM, Gabriel Kuri wrote:
I am able to reproduce a problem, which I think may be a bug, where if 1
of the 2 replica servers for a volume is down, clients are unable to
mount the volume. I notice that if the replica that is down is on the
same subnet as the client, the client fails to mount the volume, but if
the replica that is down is on a different subnet, the client fails over
properly and mounts the volume.

Here are the errors from the server that is still up when the client is
unable to mount the volume when the replica on the same subnet as the
client is down. Ideas? Should I open a bug?

[2015-07-01 05:43:08.428657] W [socket.c:923:__socket_keepalive]
0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 21, Invalid
argument
[2015-07-01 05:43:08.428710] E [socket.c:3015:socket_connect]
0-management: Failed to set keep-alive: Invalid argument
[2015-07-01 05:43:08.429260] E [socket.c:3071:socket_connect]
0-management: connection attempt on 10.1.0.100:24007
<http://10.1.0.100:24007> failed, (Connection refused)


This points to the client not being able to talk to glusterd on 10.1.0.100. Is glusterd running on this node and if yes, can port 24007 be reached from the client machine?

Regards,
Vijay

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux