Hi,
We are using glusterfs version 3.12 for 3 brick I find that occasionally after reboot all 3 sn nodes simultaneously, the glusterd process on one sn nodes may get stuck, when you try to execute glusterd command it does not response.
Following is the glusterd stuck log and the gdb info of the stuck glusterd process,
[2019-01-28 14:38:24.999329] I [rpc-clnt.c:1048:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600
[2019-01-28 14:38:24.999450] I [rpc-clnt.c:1048:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600
[2019-01-28 14:38:24.999597] I [MSGID: 106493] [glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: a4315121-e127-42ca-9869-0fe451216a80
[2019-01-28 14:38:24.999692] I [MSGID: 106493] [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: 7f694d7e-7613-4298-8da1-50cbb73ed47e, host: mn-0.local, port: 0
[2019-01-28 14:38:25.010624] W [socket.c:593:__socket_rwv] 0-management: readv on 192.168.1.14:24007 failed (No data available)
[2019-01-28 14:38:25.010774] I [MSGID: 106004] [glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer <mn-0.local> (<7f694d7e-7613-4298-8da1-50cbb73ed47e>), in state <Peer in Cluster>, has disconnected from glusterd.
[2019-01-28 14:38:25.010860] W [glusterd-locks.c:846:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x1bba4) [0x7f1828aa0ba4] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x2ed5d) [0x7f1828ab3d5d] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0xf7635) [0x7f1828b7c635] ) 0-management: Lock for vol ccs not held
[2019-01-28 14:38:25.010903] W [MSGID: 106118] [glusterd-handler.c:6342:__glusterd_peer_rpc_notify] 0-management: Lock not released for ccs
[2019-01-28 14:38:25.010931] W [glusterd-locks.c:846:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x1bba4) [0x7f1828aa0ba4] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x2ed5d) [0x7f1828ab3d5d] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0xf7635) [0x7f1828b7c635] ) 0-management: Lock for vol encryptfile not held
[2019-01-28 14:38:25.010945] W [MSGID: 106118] [glusterd-handler.c:6342:__glusterd_peer_rpc_notify] 0-management: Lock not released for encryptfile
[2019-01-28 14:38:25.010971] W [glusterd-locks.c:846:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x1bba4) [0x7f1828aa0ba4] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x2ed5d) [0x7f1828ab3d5d] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0xf7635) [0x7f1828b7c635] ) 0-management: Lock for vol export not held
[2019-01-28 14:38:25.010983] W [MSGID: 106118] [glusterd-handler.c:6342:__glusterd_peer_rpc_notify] 0-management: Lock not released for export
[2019-01-28 14:38:25.011006] W [glusterd-locks.c:846:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x1bba4) [0x7f1828aa0ba4] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x2ed5d) [0x7f1828ab3d5d] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0xf7635) [0x7f1828b7c635] ) 0-management: Lock for vol log not held
[2019-01-28 14:38:25.011046] W [MSGID: 106118] [glusterd-handler.c:6342:__glusterd_peer_rpc_notify] 0-management: Lock not released for log
[2019-01-28 14:38:25.011070] W [glusterd-locks.c:846:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x1bba4) [0x7f1828aa0ba4] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x2ed5d) [0x7f1828ab3d5d] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0xf7635) [0x7f1828b7c635] ) 0-management: Lock for vol mstate not held
[2019-01-28 14:38:25.011082] W [MSGID: 106118] [glusterd-handler.c:6342:__glusterd_peer_rpc_notify] 0-management: Lock not released for mstate
[2019-01-28 14:38:25.011104] W [glusterd-locks.c:846:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x1bba4) [0x7f1828aa0ba4] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x2ed5d) [0x7f1828ab3d5d] -->/usr/lib64/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0xf7635) [0x7f1828b7c635] ) 0-management: Lock for vol services not held
[2019-01-28 14:38:25.011115] W [MSGID: 106118] [glusterd-handler.c:6342:__glusterd_peer_rpc_notify] 0-management: Lock not released for services
[2019-01-28 14:38:25.011268] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e9)[0x7f182df24a99] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1f9)[0x7f182dce6f27] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0x1f)[0x7f182dce701a] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x11e)[0x7f182dce756e] (--> /lib64/libgfrpc.so.0(+0x12f9e)[0x7f182dce7f9e] ))))) 0-management: forced unwinding frame type(Peer mgmt) op(--(4)) called at 2019-01-28 14:38:24.999851 (xid=0x7)
[2019-01-28 14:38:25.011286] E [MSGID: 106158] [glusterd-rpc-ops.c:684:__glusterd_friend_update_cbk] 0-management: RPC Error
[2019-01-28 14:38:25.011301] I [MSGID: 106493] [glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management: Received RJT from uuid: 00000000-0000-0000-0000-000000000000
[2019-01-28 14:38:25.011424] I [MSGID: 106006] [glusterd-svc-mgmt.c:328:glusterd_svc_common_rpc_notify] 0-management: glustershd has connected with glusterd.
[2019-01-28 14:38:25.011599] I [MSGID: 106143] [glusterd-pmap.c:295:pmap_registry_bind] 0-pmap: adding brick /mnt/bricks/ccs/brick on port 53952
[2019-01-28 14:38:25.011862] I [MSGID: 106492] [glusterd-handler.c:2718:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 7f694d7e-7613-4298-8da1-50cbb73ed47e
[2019-01-28 14:38:25.021058] I [MSGID: 106502] [glusterd-handler.c:2763:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2019-01-28 14:38:25.021090] I [socket.c:3704:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2019-01-28 14:38:25.021099] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0xa, Program: GlusterD svc peer, ProgVers: 2, Proc: 4) to rpc-transport (socket.management)
[2019-01-28 14:38:25.021108] E [MSGID: 106430] [glusterd-utils.c:568:glusterd_submit_reply] 0-glusterd: Reply submission failed
[2019-01-28 14:38:25.021126] E [rpcsvc.c:559:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
[2019-01-28 14:38:25.021135] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0xa, Program: GlusterD svc peer, ProgVers: 2, Proc: 4) to rpc-transport (socket.management)
[2019-01-28 14:38:25.021147] W [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: failed to queue error reply
Gdb info the stuck thread of glusterd is:
(gdb) thread 8
[Switching to thread 8 (Thread 0x7f1826ec2700 (LWP 2418))]
#0 0x00007f182cce787c in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007f182cce787c in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f182ccea677 in __lll_lock_elision () from /lib64/libpthread.so.0
#2 0x00007f182df5cae6 in iobref_unref () from /lib64/libglusterfs.so.0
#3 0x00007f182dce2f29 in rpc_transport_pollin_destroy () from /lib64/libgfrpc.so.0
#4 0x00007f1827ccf319 in socket_event_poll_in () from /usr/lib64/glusterfs/3.12.3/rpc-transport/socket.so
#5 0x00007f1827ccf932 in socket_event_handler () from /usr/lib64/glusterfs/3.12.3/rpc-transport/socket.so
#6 0x00007f182df925d4 in event_dispatch_epoll_handler () from /lib64/libglusterfs.so.0
#7 0x00007f182df928ab in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#8 0x00007f182ccde5da in start_thread () from /lib64/libpthread.so.0
#9 0x00007f182c5b4e8f in clone () from /lib64/libc.so.6
(gdb) thread 9
[Switching to thread 9 (Thread 0x7f18266c1700 (LWP 2419))]
#0 0x00007f182cce787c in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007f182cce787c in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f182cce2b42 in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0
#2 0x00007f182cce44c8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#3 0x00007f1827ccadab in socket_event_poll_err () from /usr/lib64/glusterfs/3.12.3/rpc-transport/socket.so
#4 0x00007f1827ccf99c in socket_event_handler () from /usr/lib64/glusterfs/3.12.3/rpc-transport/socket.so
#5 0x00007f182df925d4 in event_dispatch_epoll_handler () from /lib64/libglusterfs.so.0
#6 0x00007f182df928ab in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#7 0x00007f182ccde5da in start_thread () from /lib64/libpthread.so.0
#8 0x00007f182c5b4e8f in clone () from /lib64/libc.so.6
Have you ever encounter this issue? From the gdb info it seem the epoll thread get dead lock.
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users