Pawan - I couldn't reach to any conclusive analysis so far. But, looking at the client (nfs) & glusterd log files, it does look like that there is an issue w.r.t peer connections. Does restarting all the glusterd one by one solve this?
On Mon, May 29, 2017 at 4:50 PM, Pawan Alwandi <pawan@xxxxxxxxxxx> wrote:
Sorry for big attachment in previous mail...last 1000 lines of those logs attached now.On Mon, May 29, 2017 at 4:44 PM, Pawan Alwandi <pawan@xxxxxxxxxxx> wrote:On Thu, May 25, 2017 at 9:54 PM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:On Thu, 25 May 2017 at 19:11, Pawan Alwandi <pawan@xxxxxxxxxxx> wrote:Hello Atin,Yes, glusterd on other instances are up and running. Below is the requested output on all the three hosts.Host 1
# gluster peer status
Number of Peers: 2
Hostname: 192.168.0.7
Uuid: 5ec54b4f-f60c-48c6-9e55-95f2bb58f633
State: Peer in Cluster (Disconnected)Glusterd is disconnected here.
Hostname: 192.168.0.6
Uuid: 83e9a0b9-6bd5-483b-8516-d8928805ed95
State: Peer in Cluster (Disconnected)Same as aboveCan you please check what does glusterd log have to say here about these disconnects?glusterd keeps logging this every 3s
[2017-05-29 11:04:52.182782] W [socket.c:852:__socket_keepalive] 0-socket: failed to set keep idle -1 on socket 5, Invalid argument
[2017-05-29 11:04:52.182808] E [socket.c:2966:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2017-05-29 11:04:52.183032] W [socket.c:852:__socket_keepalive] 0-socket: failed to set keep idle -1 on socket 20, Invalid argument
[2017-05-29 11:04:52.183052] E [socket.c:2966:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2017-05-29 11:04:52.183622] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libg lusterfs.so.0(_gf_log_callingf n+0x1a3)[0x7f767c46d483] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(saved_frames_unwind+ 0x1cf)[0x7f767c2383af] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(saved_frames_destroy +0xe)[0x7f767c2384ce] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(rpc_clnt_connection_ cleanup+0x7e)[0x7f767c239c8e] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(rpc_clnt_notify+0x88 )[0x7f767c23a4a8] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2017-05-29 11:04:52.183210 (xid=0x23419)
[2017-05-29 11:04:52.183735] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/ glusterfs/3.7.9/xlator/mgmt/gl usterd.so(glusterd_big_locked_ notify+0x4b) [0x7f767734dffb] -->/usr/lib/x86_64-linux-gnu/g lusterfs/3.7.9/xlator/mgmt/glu sterd.so(__glusterd_peer_rpc_ notify+0x14a) [0x7f7677357c6a] -->/usr/lib/x86_64-linux-gnu/g lusterfs/3.7.9/xlator/mgmt/glu sterd.so(glusterd_mgmt_v3_unlo ck+0x4c3) [0x7f76773f0ef3] ) 0-management: Lock for vol shared not held
[2017-05-29 11:04:52.183928] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libg lusterfs.so.0(_gf_log_callingf n+0x1a3)[0x7f767c46d483] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(saved_frames_unwind+ 0x1cf)[0x7f767c2383af] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(saved_frames_destroy +0xe)[0x7f767c2384ce] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(rpc_clnt_connection_ cleanup+0x7e)[0x7f767c239c8e] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(rpc_clnt_notify+0x88 )[0x7f767c23a4a8] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2017-05-29 11:04:52.183422 (xid=0x23419)
[2017-05-29 11:04:52.184027] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/ glusterfs/3.7.9/xlator/mgmt/gl usterd.so(glusterd_big_locked_ notify+0x4b) [0x7f767734dffb] -->/usr/lib/x86_64-linux-gnu/g lusterfs/3.7.9/xlator/mgmt/glu sterd.so(__glusterd_peer_rpc_ notify+0x14a) [0x7f7677357c6a] -->/usr/lib/x86_64-linux-gnu/g lusterfs/3.7.9/xlator/mgmt/glu sterd.so(glusterd_mgmt_v3_unlo ck+0x4c3) [0x7f76773f0ef3] ) 0-management: Lock for vol shared not held
# gluster volume status
Status of volume: shared
Gluster processTCP Port RDMA Port Online Pid
------------------------------------------------------------ ------------------
Brick 192.168.0.5:/data/exports/shared 49152 0 Y 2105
NFS Server on localhost 2049 0 Y 2089
Self-heal Daemon on localhost N/A N/A Y 2097Volume status output does show all the bricks are up. So I'm not sure why are you seeing the volume as read only. Can you please provide the mount log?The attached tar has nfs.log, etc-glusterfs-glusterd.vol.log, glustershd.log from host1. # gluster volume statusHost 3Host 2
Task Status of Volume shared
------------------------------------------------------------ ------------------
There are no active volume tasks
# gluster peer status
Number of Peers: 2
Hostname: 192.168.0.7
Uuid: 5ec54b4f-f60c-48c6-9e55-95f2bb58f633
State: Peer in Cluster (Connected)
Hostname: 192.168.0.5
Uuid: 7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
State: Peer in Cluster (Connected)
# gluster volume status
Status of volume: shared
Gluster process Port Online Pid
------------------------------------------------------------ ------------------
Brick 192.168.0.5:/data/exports/shared 49152 Y 2105
Brick 192.168.0.6:/data/exports/shared 49152 Y 2188
Brick 192.168.0.7:/data/exports/shared 49152 Y 2453
NFS Server on localhost 2049 Y 2194
Self-heal Daemon on localhost N/A Y 2199
NFS Server on 192.168.0.5 2049 Y 2089
Self-heal Daemon on 192.168.0.5 N/A Y 2097
NFS Server on 192.168.0.7 2049 Y 2458
Self-heal Daemon on 192.168.0.7 N/A Y 2463
Task Status of Volume shared
------------------------------------------------------------ ------------------
There are no active volume tasks
# gluster peer status
Number of Peers: 2
Hostname: 192.168.0.5
Uuid: 7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
State: Peer in Cluster (Connected)
Hostname: 192.168.0.6
Uuid: 83e9a0b9-6bd5-483b-8516-d8928805ed95
State: Peer in Cluster (Connected)
Status of volume: shared
Gluster process Port Online Pid
------------------------------------------------------------ ------------------
Brick 192.168.0.5:/data/exports/shared 49152 Y 2105
Brick 192.168.0.6:/data/exports/shared 49152 Y 2188
Brick 192.168.0.7:/data/exports/shared 49152 Y 2453
NFS Server on localhost 2049 Y 2458
Self-heal Daemon on localhost N/A Y 2463
NFS Server on 192.168.0.6 2049 Y 2194
Self-heal Daemon on 192.168.0.6 N/A Y 2199
NFS Server on 192.168.0.5 2049 Y 2089
Self-heal Daemon on 192.168.0.5 N/A Y 2097
Task Status of Volume shared
------------------------------------------------------------ ------------------
There are no active volume tasksOn Wed, May 24, 2017 at 8:32 PM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:Are the other glusterd instances are up? output of gluster peer status & gluster volume status please?On Wed, May 24, 2017 at 4:20 PM, Pawan Alwandi <pawan@xxxxxxxxxxx> wrote:The heal info says this:I see these being logged every 3s:Thanks Atin,So I got gluster downgraded to 3.7.9 on host 1 and now have the glusterfs and glusterfsd processes come up. But I see the volume is mounted read only.
[2017-05-24 10:45:44.440435] W [socket.c:852:__socket_keepalive] 0-socket: failed to set keep idle -1 on socket 17, Invalid argument
[2017-05-24 10:45:44.440475] E [socket.c:2966:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2017-05-24 10:45:44.440734] W [socket.c:852:__socket_keepalive] 0-socket: failed to set keep idle -1 on socket 20, Invalid argument
[2017-05-24 10:45:44.440754] E [socket.c:2966:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2017-05-24 10:45:44.441354] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libg lusterfs.so.0(_gf_log_callingf n+0x1a3)[0x7f767c46d483] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(saved_frames_unwind+ 0x1cf)[0x7f767c2383af] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(saved_frames_destroy +0xe)[0x7f767c2384ce] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(rpc_clnt_connection_ cleanup+0x7e)[0x7f767c239c8e] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(rpc_clnt_notify+0x88 )[0x7f767c23a4a8] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2017-05-24 10:45:44.440945 (xid=0xbf)
[2017-05-24 10:45:44.441505] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/ glusterfs/3.7.9/xlator/mgmt/gl usterd.so(glusterd_big_locked_ notify+0x4b) [0x7f767734dffb] -->/usr/lib/x86_64-linux-gnu/g lusterfs/3.7.9/xlator/mgmt/glu sterd.so(__glusterd_peer_rpc_n otify+0x14a) [0x7f7677357c6a] -->/usr/lib/x86_64-linux-gnu/g lusterfs/3.7.9/xlator/mgmt/glu sterd.so(glusterd_mgmt_v3_unlo ck+0x4c3) [0x7f76773f0ef3] ) 0-management: Lock for vol shared not held
[2017-05-24 10:45:44.441660] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libg lusterfs.so.0(_gf_log_callingf n+0x1a3)[0x7f767c46d483] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(saved_frames_unwind+ 0x1cf)[0x7f767c2383af] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(saved_frames_destroy +0xe)[0x7f767c2384ce] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(rpc_clnt_connection_ cleanup+0x7e)[0x7f767c239c8e] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(rpc_clnt_notify+0x88 )[0x7f767c23a4a8] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2017-05-24 10:45:44.441086 (xid=0xbf)
[2017-05-24 10:45:44.441790] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/ glusterfs/3.7.9/xlator/mgmt/gl usterd.so(glusterd_big_locked_ notify+0x4b) [0x7f767734dffb] -->/usr/lib/x86_64-linux-gnu/g lusterfs/3.7.9/xlator/mgmt/glu sterd.so(__glusterd_peer_rpc_n otify+0x14a) [0x7f7677357c6a] -->/usr/lib/x86_64-linux-gnu/g lusterfs/3.7.9/xlator/mgmt/glu sterd.so(glusterd_mgmt_v3_unlo ck+0x4c3) [0x7f76773f0ef3] ) 0-management: Lock for vol shared not held
# gluster volume heal shared info
Brick 192.168.0.5:/data/exports/shared
Number of entries: 0
Brick 192.168.0.6:/data/exports/shared
Status: Transport endpoint is not connected
Brick 192.168.0.7:/data/exports/shared
Status: Transport endpoint is not connectedAny idea whats up here?PawanOn Mon, May 22, 2017 at 9:42 PM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:On Mon, May 22, 2017 at 9:05 PM, Pawan Alwandi <pawan@xxxxxxxxxxx> wrote:On Mon, May 22, 2017 at 8:36 PM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:On Mon, May 22, 2017 at 7:51 PM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:Sorry Pawan, I did miss the other part of the attachments. So looking from the glusterd.info file from all the hosts, it looks like host2 and host3 do not have the correct op-version. Can you please set the op-version as "operating-version=30702" in host2 and host3 and restart glusterd instance one by one on all the nodes?Please ensure that all the hosts are upgraded to the same bits before doing this change.
Having to upgrade all 3 hosts to newer version before gluster could work successfully on any of them means application downtime. The applications running on these hosts are expected to be highly available. So with the way the things are right now, is an online upgrade possible? My upgrade steps are: (1) stop the applications (2) umount the gluster volume, and then (3) upgrade gluster one host at a time.
One of the way to mitigate this is to first do an online upgrade to glusterfs-3.7.9 (op-version:30707) given this bug was introduced in 3.7.10 and then come to 3.11.
Our goal is to get gluster upgraded to 3.11 from 3.6.9, and to make this an online upgrade we are okay to take two steps 3.6.9 -> 3.7 and then 3.7 to 3.11.Apparently it looks like there is a bug which you have uncovered, during peer handshaking if one of the glusterd instance is running with old bits then during validating the handshake request there is a possibility that uuid received will be blank and the same was ignored however there was a patch http://review.gluster.org/13519 which had some additional changes which was always looking at this field and doing some extra checks which was causing the handshake to fail. For now, the above workaround should suffice. I'll be sending a patch pretty soon.On Mon, May 22, 2017 at 11:35 AM, Pawan Alwandi <pawan@xxxxxxxxxxx> wrote:ThanksHello Atin,The tar's have the content of `/var/lib/glusterd` too for all 3 nodes, please check again.On Mon, May 22, 2017 at 11:32 AM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:Pawan,I see you have provided the log files from the nodes, however it'd be really helpful if you can provide me the content of /var/lib/glusterd from all the nodes to get to the root cause of this issue.On Fri, May 19, 2017 at 12:09 PM, Pawan Alwandi <pawan@xxxxxxxxxxx> wrote:Hello Atin,Thanks for continued support. I've attached requested files from all 3 nodes.(I think we already verified the UUIDs to be correct, anyway let us know if you find any more info in the logs)PawanOn Thu, May 18, 2017 at 11:45 PM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:On Thu, 18 May 2017 at 23:40, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:On Wed, 17 May 2017 at 12:47, Pawan Alwandi <pawan@xxxxxxxxxxx> wrote:Hello Atin,I realized that these http://gluster.readthedocs.io/en/latest/Upgrade-Guide/upgrad instructions only work for upgrades from 3.7, while we are running 3.6.2. Are there instructions/suggestion you have for us to upgrade from 3.6 version?e_to_3.10/ I believe upgrade from 3.6 to 3.7 and then to 3.10 would work, but I see similar errors reported when I upgraded to 3.7 too.For what its worth, I was able to set the op-version (gluster v set all cluster.op-version 30702) but that doesn't seem to help.
[2017-05-17 06:48:33.700014] I [MSGID: 100030] [glusterfsd.c:2338:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.7.20 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid)
[2017-05-17 06:48:33.703808] I [MSGID: 106478] [glusterd.c:1383:init] 0-management: Maximum allowed open file descriptors set to 65536
[2017-05-17 06:48:33.703836] I [MSGID: 106479] [glusterd.c:1432:init] 0-management: Using /var/lib/glusterd as working directory
[2017-05-17 06:48:33.708866] W [MSGID: 103071] [rdma.c:4594:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device]
[2017-05-17 06:48:33.709011] W [MSGID: 103055] [rdma.c:4901:init] 0-rdma.management: Failed to initialize IB Device
[2017-05-17 06:48:33.709033] W [rpc-transport.c:359:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed
[2017-05-17 06:48:33.709088] W [rpcsvc.c:1642:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[2017-05-17 06:48:33.709105] E [MSGID: 106243] [glusterd.c:1656:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[2017-05-17 06:48:35.480043] I [MSGID: 106513] [glusterd-store.c:2068:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 30600
[2017-05-17 06:48:35.605779] I [MSGID: 106498] [glusterd-handler.c:3640:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2017-05-17 06:48:35.607059] I [rpc-clnt.c:1046:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2017-05-17 06:48:35.607670] I [rpc-clnt.c:1046:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2017-05-17 06:48:35.607025] I [MSGID: 106498] [glusterd-handler.c:3640:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2017-05-17 06:48:35.608125] I [MSGID: 106544] [glusterd.c:159:glusterd_uuid_init] 0-management: retrieved UUID: 7f2a6e11-2a53-4ab4-9ceb-8be6a9 f2d073
Final graph:
+----------------------------------------------------------- -------------------+
1: volume management
2: type mgmt/glusterd
3: option rpc-auth.auth-glusterfs on
4: option rpc-auth.auth-unix on
5: option rpc-auth.auth-null on
6: option rpc-auth-allow-insecure on
7: option transport.socket.listen-backlog 128
8: option event-threads 1
9: option ping-timeout 0
10: option transport.socket.read-fail-log off
11: option transport.socket.keepalive-interval 2
12: option transport.socket.keepalive-time 10
13: option transport-type rdma
14: option working-directory /var/lib/glusterd
15: end-volume
16:
+----------------------------------------------------------- -------------------+ [2017-05-17 06:48:35.609868] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2017-05-17 06:48:35.610839] W [socket.c:596:__socket_rwv] 0-management: readv on 192.168.0.7:24007 failed (No data available)
[2017-05-17 06:48:35.611907] E [rpc-clnt.c:370:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libg lusterfs.so.0(_gf_log_callingf n+0x1a3)[0x7fd6c2d70bb3] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(saved_frames_unwind+ 0x1cf)[0x7fd6c2b3a2df] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(saved_frames_destroy +0xe)[0x7fd6c2b3a3fe] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(rpc_clnt_connection_ cleanup+0x89)[0x7fd6c2b3ba39] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(rpc_clnt_notify+0x16 0)[0x7fd6c2b3c380] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2017-05-17 06:48:35.609965 (xid=0x1)
[2017-05-17 06:48:35.611928] E [MSGID: 106167] [glusterd-handshake.c:2091:__glusterd_peer_dump_version_cbk] 0-management: Error through RPC layer, retry again later
[2017-05-17 06:48:35.611944] I [MSGID: 106004] [glusterd-handler.c:5201:__glusterd_peer_rpc_notify] 0-management: Peer <192.168.0.7> (<5ec54b4f-f60c-48c6-9e55-95f2 bb58f633>), in state <Peer in Cluster>, has disconnected from glusterd.
[2017-05-17 06:48:35.612024] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/ glusterfs/3.7.20/xlator/mgmt/g lusterd.so(glusterd_big_locked _notify+0x4b) [0x7fd6bdc4912b] -->/usr/lib/x86_64-linux-gnu/g lusterfs/3.7.20/xlator/mgmt/gl usterd.so(__glusterd_peer_rpc_ notify+0x160) [0x7fd6bdc52dd0] -->/usr/lib/x86_64-linux-gnu/g lusterfs/3.7.20/xlator/mgmt/gl usterd.so(glusterd_mgmt_v3_unl ock+0x4c3) [0x7fd6bdcef1b3] ) 0-management: Lock for vol shared not held
[2017-05-17 06:48:35.612039] W [MSGID: 106118] [glusterd-handler.c:5223:__glusterd_peer_rpc_notify] 0-management: Lock not released for shared
[2017-05-17 06:48:35.612079] W [socket.c:596:__socket_rwv] 0-management: readv on 192.168.0.6:24007 failed (No data available)
[2017-05-17 06:48:35.612179] E [rpc-clnt.c:370:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libg lusterfs.so.0(_gf_log_callingf n+0x1a3)[0x7fd6c2d70bb3] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(saved_frames_unwind+ 0x1cf)[0x7fd6c2b3a2df] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(saved_frames_destroy +0xe)[0x7fd6c2b3a3fe] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(rpc_clnt_connection_ cleanup+0x89)[0x7fd6c2b3ba39] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(rpc_clnt_notify+0x16 0)[0x7fd6c2b3c380] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2017-05-17 06:48:35.610007 (xid=0x1)
[2017-05-17 06:48:35.612197] E [MSGID: 106167] [glusterd-handshake.c:2091:__glusterd_peer_dump_version_cbk] 0-management: Error through RPC layer, retry again later
[2017-05-17 06:48:35.612211] I [MSGID: 106004] [glusterd-handler.c:5201:__glusterd_peer_rpc_notify] 0-management: Peer <192.168.0.6> (<83e9a0b9-6bd5-483b-8516-d892 8805ed95>), in state <Peer in Cluster>, has disconnected from glusterd.
[2017-05-17 06:48:35.612292] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/ glusterfs/3.7.20/xlator/mgmt/g lusterd.so(glusterd_big_locked _notify+0x4b) [0x7fd6bdc4912b] -->/usr/lib/x86_64-linux-gnu/g lusterfs/3.7.20/xlator/mgmt/gl usterd.so(__glusterd_peer_rpc_ notify+0x160) [0x7fd6bdc52dd0] -->/usr/lib/x86_64-linux-gnu/g lusterfs/3.7.20/xlator/mgmt/gl usterd.so(glusterd_mgmt_v3_unl ock+0x4c3) [0x7fd6bdcef1b3] ) 0-management: Lock for vol shared not held
[2017-05-17 06:48:35.613432] W [MSGID: 106118] [glusterd-handler.c:5223:__glusterd_peer_rpc_notify] 0-management: Lock not released for shared
[2017-05-17 06:48:35.614317] E [MSGID: 106170] [glusterd-handshake.c:1051:gd_validate_mgmt_hndsk_req] 0-management: Request from peer 192.168.0.6:991 has an entry in peerinfo, but uuid does not match Apologies for delay. My initial suspect was correct. You have an incorrect UUID in the peer file which is causing this. Can you please provide me theClicked the send button accidentally!Can you please send me the content of /var/lib/glusterd & glusterd log from all the nodes?On Mon, May 15, 2017 at 10:31 PM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:On Mon, 15 May 2017 at 11:58, Pawan Alwandi <pawan@xxxxxxxxxxx> wrote:Hi Atin,I see below error. Do I require gluster to be upgraded on all 3 hosts for this to work? Right now I have host 1 running 3.10.1 and host 2 & 3 running 3.6.2
# gluster v set all cluster.op-version 31001
volume set: failed: Required op_version (31001) is not supportedYes you should given 3.6 version is EOLed.On Mon, May 15, 2017 at 3:32 AM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:On Sun, 14 May 2017 at 21:43, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:Allright, I see that you haven't bumped up the op-version. Can you please execute:gluster v set all cluster.op-version 30101 and then restart glusterd on all the nodes and check the brick status?s/30101/31001On Sun, May 14, 2017 at 8:55 PM, Pawan Alwandi <pawan@xxxxxxxxxxx> wrote:Host 2Hello Atin,Thanks for looking at this. Below is the output you requested for.Again, I'm seeing those errors after upgrading gluster on host 1.Host 1
# cat /var/lib/glusterd/glusterd.info
UUID=7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
operating-version=30600
# cat /var/lib/glusterd/peers/*
uuid=5ec54b4f-f60c-48c6-9e55-95f2bb58f633
state=3
hostname1=192.168.0.7
uuid=83e9a0b9-6bd5-483b-8516-d8928805ed95
state=3
hostname1=192.168.0.6
# gluster --version
glusterfs 3.10.1
# cat /var/lib/glusterd/glusterd.info
UUID=83e9a0b9-6bd5-483b-8516-d8928805ed95
operating-version=30600
# cat /var/lib/glusterd/peers/*
uuid=5ec54b4f-f60c-48c6-9e55-95f2bb58f633
state=3
hostname1=192.168.0.7
uuid=7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
state=3
hostname1=192.168.0.5
# gluster --version
glusterfs 3.6.2 built on Jan 21 2015 14:23:44Host 3
# cat /var/lib/glusterd/glusterd.info
UUID=5ec54b4f-f60c-48c6-9e55-95f2bb58f633
operating-version=30600
# cat /var/lib/glusterd/peers/*
uuid=7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
state=3
hostname1=192.168.0.5
uuid=83e9a0b9-6bd5-483b-8516-d8928805ed95
state=3
hostname1=192.168.0.6
# gluster --version
glusterfs 3.6.2 built on Jan 21 2015 14:23:44On Sat, May 13, 2017 at 6:28 PM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:I have already asked for the following earlier:cat /var/lib/glusterd/peers/*On Sat, 13 May 2017 at 12:22, Pawan Alwandi <pawan@xxxxxxxxxxx> wrote:______________________________Hello folks,Does anyone have any idea whats going on here?Thanks,PawanOn Wed, May 10, 2017 at 5:02 PM, Pawan Alwandi <pawan@xxxxxxxxxxx> wrote:Logs below:This is a 3 node server setup with a replicated volume having replica count of 3.Hello,I'm trying to upgrade gluster from 3.6.2 to 3.10.1 but don't see the glusterfsd and glusterfs processes coming up. http://gluster.readthedocs.io/en/latest/Upgrade-Guide/upgrad is the process that I'm trying to follow.e_to_3.10/
[2017-05-10 09:07:03.507959] I [MSGID: 100030] [glusterfsd.c:2460:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.10.1 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid)
[2017-05-10 09:07:03.512827] I [MSGID: 106478] [glusterd.c:1449:init] 0-management: Maximum allowed open file descriptors set to 65536
[2017-05-10 09:07:03.512855] I [MSGID: 106479] [glusterd.c:1496:init] 0-management: Using /var/lib/glusterd as working directory
[2017-05-10 09:07:03.520426] W [MSGID: 103071] [rdma.c:4590:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device]
[2017-05-10 09:07:03.520452] W [MSGID: 103055] [rdma.c:4897:init] 0-rdma.management: Failed to initialize IB Device
[2017-05-10 09:07:03.520465] W [rpc-transport.c:350:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed
[2017-05-10 09:07:03.520518] W [rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[2017-05-10 09:07:03.520534] E [MSGID: 106243] [glusterd.c:1720:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[2017-05-10 09:07:04.931764] I [MSGID: 106513] [glusterd-store.c:2197:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 30600
[2017-05-10 09:07:04.964354] I [MSGID: 106544] [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: 7f2a6e11-2a53-4ab4-9ceb-8be6a9 f2d073
[2017-05-10 09:07:04.993944] I [MSGID: 106498] [glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2017-05-10 09:07:04.995864] I [MSGID: 106498] [glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2017-05-10 09:07:04.995879] W [MSGID: 106062] [glusterd-handler.c:3466:glusterd_transport_inet_options_bui ld] 0-glusterd: Failed to get tcp-user-timeout
[2017-05-10 09:07:04.995903] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2017-05-10 09:07:04.996325] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
Final graph:
+----------------------------------------------------------- -------------------+
1: volume management
2: type mgmt/glusterd
3: option rpc-auth.auth-glusterfs on
4: option rpc-auth.auth-unix on
5: option rpc-auth.auth-null on
6: option rpc-auth-allow-insecure on
7: option transport.socket.listen-backlog 128
8: option event-threads 1
9: option ping-timeout 0
10: option transport.socket.read-fail-log off
11: option transport.socket.keepalive-interval 2
12: option transport.socket.keepalive-time 10
13: option transport-type rdma
14: option working-directory /var/lib/glusterd
15: end-volume
16:
+----------------------------------------------------------- -------------------+
[2017-05-10 09:07:04.996310] W [MSGID: 106062] [glusterd-handler.c:3466:glusterd_transport_inet_options_bui ld] 0-glusterd: Failed to get tcp-user-timeout
[2017-05-10 09:07:05.000461] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2017-05-10 09:07:05.001493] W [socket.c:593:__socket_rwv] 0-management: readv on 192.168.0.7:24007 failed (No data available)
[2017-05-10 09:07:05.001513] I [MSGID: 106004] [glusterd-handler.c:5882:__glusterd_peer_rpc_notify] 0-management: Peer <192.168.0.7> (<5ec54b4f-f60c-48c6-9e55-95f2 bb58f633>), in state <Peer in Cluster>, h
as disconnected from glusterd.
[2017-05-10 09:07:05.001677] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/ glusterfs/3.10.1/xlator/mgmt/g lusterd.so(+0x20559) [0x7f0bf9d74559] -->/usr/lib/x86_64-linux-gnu
/glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x29cf0) [0x7f0bf9d7dcf0] -->/usr/lib/x86_64-linux-gnu/g lusterfs/3.10.1/xlator/mgmt/gl usterd.so(+0xd5ba3) [0x7f0bf9e29ba3] ) 0-management: Lock for vol shared no
t held
[2017-05-10 09:07:05.001696] W [MSGID: 106118] [glusterd-handler.c:5907:__glusterd_peer_rpc_notify] 0-management: Lock not released for shared
[2017-05-10 09:07:05.003099] E [rpc-clnt.c:365:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libg lusterfs.so.0(_gf_log_callingf n+0x13c)[0x7f0bfeeca73c] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(s
aved_frames_unwind+0x1cf)[0x7f0bfec904bf] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(saved_frames_destroy +0xe)[0x7f0bfec905de] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(rpc_clnt_connection_ cleanup+0x
91)[0x7f0bfec91c21] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x29 0)[0x7f0bfec92710] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2017-05-10 09:0
7:05.000627 (xid=0x1)
[2017-05-10 09:07:05.003129] E [MSGID: 106167] [glusterd-handshake.c:2181:__glusterd_peer_dump_version_cbk] 0-management: Error through RPC layer, retry again later
[2017-05-10 09:07:05.003251] W [socket.c:593:__socket_rwv] 0-management: readv on 192.168.0.6:24007 failed (No data available)
[2017-05-10 09:07:05.003267] I [MSGID: 106004] [glusterd-handler.c:5882:__glusterd_peer_rpc_notify] 0-management: Peer <192.168.0.6> (<83e9a0b9-6bd5-483b-8516-d892 8805ed95>), in state <Peer in Cluster>, h
as disconnected from glusterd.
[2017-05-10 09:07:05.003318] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/ glusterfs/3.10.1/xlator/mgmt/g lusterd.so(+0x20559) [0x7f0bf9d74559] -->/usr/lib/x86_64-linux-gnu
/glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x29cf0) [0x7f0bf9d7dcf0] -->/usr/lib/x86_64-linux-gnu/g lusterfs/3.10.1/xlator/mgmt/gl usterd.so(+0xd5ba3) [0x7f0bf9e29ba3] ) 0-management: Lock for vol shared no
t held
[2017-05-10 09:07:05.003329] W [MSGID: 106118] [glusterd-handler.c:5907:__glusterd_peer_rpc_notify] 0-management: Lock not released for shared
[2017-05-10 09:07:05.003457] E [rpc-clnt.c:365:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libg lusterfs.so.0(_gf_log_callingf n+0x13c)[0x7f0bfeeca73c] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(s
aved_frames_unwind+0x1cf)[0x7f0bfec904bf] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(saved_frames_destroy +0xe)[0x7f0bfec905de] (--> /usr/lib/x86_64-linux-gnu/libg frpc.so.0(rpc_clnt_connection_ cleanup+0x
91)[0x7f0bfec91c21] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x29 0)[0x7f0bfec92710] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2017-05-10 09:0
7:05.001407 (xid=0x1)There are a bunch of errors reported but I'm not sure which is signal and which ones are noise. Does anyone have any idea whats going on here?Thanks,Pawan_________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users --- Atin (atinm)--- Atin (atinm)--- Atin (atinm)--- Atin (atinm)--- Atin (atinm)--- Atin (atinm)
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users