rpc/glusterd-locks error

Vineet Khandpur <khandpur@xxxxxxxxxxx> · Mon, 26 Feb 2018 07:41:34 -0700

Good morning.

We have a 6 node cluster. 3 nodes are participating in a replica 3 volume.
Naming convention:
xx01 - 3 nodes participating in ovirt_vol
xx02 - 3 nodes NOT particpating in ovirt_vol

Last week, restarted glusterd on each node in cluster to update (one at a time).
The three xx01 nodes all show the following in glusterd.log:

[2018-02-26 14:31:47.330670] E [socket.c:2020:__socket_read_frag] 0-rpc: wrong MSG-TYPE (29386) received from 172.26.30.9:24007
[2018-02-26 14:31:47.330879] W [glusterd-locks.c:843:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.5/xlator/mgmt/glusterd.so(+0x2322a) [0x7f46020e922a] -->/usr/lib64/glusterfs/3.12.5/xlator/mgmt/glusterd.so(+0x2d198) [0x7f46020f3198] -->/usr/lib64/glusterfs/3.12.5/xlator/mgmt/glusterd.so(+0xe4755) [0x7f46021aa755] ) 0-management: Lock for vol ovirtprod_vol not held
[2018-02-26 14:31:47.331066] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f460d64dedb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f460d412e6e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f460d412f8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f460d414710] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f460d415200] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2018-02-26 14:31:47.330496 (xid=0x72e0)
[2018-02-26 14:31:47.333993] E [socket.c:2020:__socket_read_frag] 0-rpc: wrong MSG-TYPE (84253) received from 172.26.30.8:24007
[2018-02-26 14:31:47.334148] W [glusterd-locks.c:843:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.5/xlator/mgmt/glusterd.so(+0x2322a) [0x7f46020e922a] -->/usr/lib64/glusterfs/3.12.5/xlator/mgmt/glusterd.so(+0x2d198) [0x7f46020f3198] -->/usr/lib64/glusterfs/3.12.5/xlator/mgmt/glusterd.so(+0xe4755) [0x7f46021aa755] ) 0-management: Lock for vol ovirtprod_vol not held
[2018-02-26 14:31:47.334317] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f460d64dedb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f460d412e6e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f460d412f8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f460d414710] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f460d415200] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2018-02-26 14:31:47.333824 (xid=0x1494b)
[2018-02-26 14:31:48.511390] E [socket.c:2632:socket_poller] 0-socket.management: poll error on socket

Additionally, all show connectivity to 2 of the three hosts (itself, and a second). None of the 3 show connectivity to the same host (xx01 show connectivity to itself and yy01, yy01 show connectivity to itself and zz01, zz01 shows itself and xx01).

However, xx02 hosts (non-volume participating, same cluster) show volume info as being fine, and all xx01 hosts participating in volume.

In our dev environment, had to stop the volume, and restart glusterd on all hosts, however for prod, that would mean a system wide outage and down time, which needs to be avoided.

Any suggestions? Thanks.

vk
--------------------------------
Vineet Khandpur
UNIX System Administrator
Information Technology Services
University of Alberta Libraries
+1-780-492-4718

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users