This is an issue with 3.7.1, rebalance code path in glusterd is broken. The fix will be released in 3.7.2. ~Atin On 06/11/2015 12:21 PM, 何亦军 wrote: > Hi all, > > My glusterfs pool updated from 3.6.2 to 3.7.1, the node server os is centos 7.1.1503 . > some server work well , that server met glusterd start up problem. anyone can help me ? > > some message below: > > [root@gwgfs02 bricks]# systemctl status glusterd > glusterd.service - GlusterFS, a clustered file-system server > Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled) > Active: failed (Result: signal) since Thu 2015-06-11 14:37:10 CST; 3s ago > Process: 4166 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid (code=exited, status=0/SUCCESS) > Main PID: 4167 (code=killed, signal=ABRT) > > Jun 11 14:37:10 gwgfs02 etc-glusterfs-glusterd.vol[4167]: llistxattr 1 > Jun 11 14:37:10 gwgfs02 etc-glusterfs-glusterd.vol[4167]: setfsid 1 > Jun 11 14:37:10 gwgfs02 etc-glusterfs-glusterd.vol[4167]: spinlock 1 > Jun 11 14:37:10 gwgfs02 etc-glusterfs-glusterd.vol[4167]: epoll.h 1 > Jun 11 14:37:10 gwgfs02 etc-glusterfs-glusterd.vol[4167]: xattr.h 1 > Jun 11 14:37:10 gwgfs02 etc-glusterfs-glusterd.vol[4167]: st_atim.tv_nsec 1 > Jun 11 14:37:10 gwgfs02 etc-glusterfs-glusterd.vol[4167]: package-string: glusterfs 3.7.1 > Jun 11 14:37:10 gwgfs02 etc-glusterfs-glusterd.vol[4167]: --------- > Jun 11 14:37:10 gwgfs02 systemd[1]: glusterd.service: main process exited, code=killed, status=6/ABRT > Jun 11 14:37:10 gwgfs02 systemd[1]: Unit glusterd.service entered failed state. > > some log in etc-glusterfs-glusterd.vol.log : > [2015-06-11 06:37:10.187333] W [rdma.c:4493:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed (No such device) > [2015-06-11 06:37:10.187357] W [rdma.c:4793:init] 0-rdma.management: Failed to initialize IB Device > [2015-06-11 06:37:10.187367] W [rpc-transport.c:358:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed > [2015-06-11 06:37:10.187473] W [rpcsvc.c:1595:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed > [2015-06-11 06:37:10.187490] E [glusterd.c:1515:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport > [2015-06-11 06:37:10.188848] I [glusterd.c:413:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system > [2015-06-11 06:37:10.189361] I [glusterd-store.c:1986:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 30700 > [2015-06-11 06:37:10.189475] I [glusterd.c:154:glusterd_uuid_init] 0-management: retrieved UUID: d79c0a67-155b-43a8-8b51-151cc97aa4da > [2015-06-11 06:37:10.189557] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-glustershd: setting frame-timeout to 600 > [2015-06-11 06:37:10.189769] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600 > [2015-06-11 06:37:10.189931] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-quotad: setting frame-timeout to 600 > [2015-06-11 06:37:10.190093] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-bitd: setting frame-timeout to 600 > [2015-06-11 06:37:10.190287] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-scrub: setting frame-timeout to 600 > [2015-06-11 06:37:10.190515] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600 > [2015-06-11 06:37:10.467359] I [glusterd-handler.c:3387:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 > [2015-06-11 06:37:10.467437] I [glusterd-handler.c:3387:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 > [2015-06-11 06:37:10.467493] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 > [2015-06-11 06:37:10.471021] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 13, Invalid argument > [2015-06-11 06:37:10.471039] E [socket.c:3015:socket_connect] 0-management: Failed to set keep-alive: Invalid argument > [2015-06-11 06:37:10.471159] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 > [2015-06-11 06:37:10.474425] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 14, Invalid argument > [2015-06-11 06:37:10.474442] E [socket.c:3015:socket_connect] 0-management: Failed to set keep-alive: Invalid argument > Final graph: > +------------------------------------------------------------------------------+ > 1: volume management > 2: type mgmt/glusterd > 3: option rpc-auth.auth-glusterfs on > 4: option rpc-auth.auth-unix on > 5: option rpc-auth.auth-null on > 6: option transport.socket.listen-backlog 128 > 7: option ping-timeout 30 > 8: option transport.socket.read-fail-log off > 9: option transport.socket.keepalive-interval 2 > 10: option transport.socket.keepalive-time 10 > 11: option transport-type rdma > 12: option working-directory /var/lib/glusterd > 13: end-volume > 14: > +------------------------------------------------------------------------------+ > [2015-06-11 06:37:10.476457] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 > [2015-06-11 06:37:10.553448] I [glusterd-rpc-ops.c:464:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: b80f71d0-6944-4236-af96-e272a1f7e739, host: 192.168.0.61, port: 0 > [2015-06-11 06:37:10.572277] I [glusterd-handler.c:2587:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: b80f71d0-6944-4236-af96-e272a1f7e739 > [2015-06-11 06:37:10.572312] I [glusterd-handler.c:2630:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend > [2015-06-11 06:37:10.572628] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped > [2015-06-11 06:37:10.572673] W [socket.c:3059:socket_connect] 0-nfs: Ignore failed connection attempt on , (No such file or directory) > [2015-06-11 06:37:10.573149] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd already stopped > [2015-06-11 06:37:10.575894] W [socket.c:3059:socket_connect] 0-glustershd: Ignore failed connection attempt on , (No such file or directory) > [2015-06-11 06:37:10.578510] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad already stopped > [2015-06-11 06:37:10.581415] W [socket.c:3059:socket_connect] 0-quotad: Ignore failed connection attempt on , (No such file or directory) > [2015-06-11 06:37:10.581496] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped > [2015-06-11 06:37:10.581539] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped > [2015-06-11 06:37:10.584198] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 > [2015-06-11 06:37:10.588633] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 > pending frames: > frame : type(0) op(0) > patchset: git://git.gluster.com/glusterfs.git > signal received: 6 > time of crash: > 2015-06-11 06:37:10 > configuration details: > argp 1 > backtrace 1 > dlfcn 1 > libpthread 1 > llistxattr 1 > setfsid 1 > spinlock 1 > epoll.h 1 > xattr.h 1 > st_atim.tv_nsec 1 > package-string: glusterfs 3.7.1 > /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7f15d41c0d92] > /lib64/libglusterfs.so.0(gf_print_trace+0x32d)[0x7f15d41db9ed] > /lib64/libc.so.6(+0x35650)[0x7f15d2bb2650] > /lib64/libc.so.6(gsignal+0x37)[0x7f15d2bb25d7] > /lib64/libc.so.6(abort+0x148)[0x7f15d2bb3cc8] > /lib64/libc.so.6(+0x75e07)[0x7f15d2bf2e07] > /lib64/libc.so.6(__fortify_fail+0x37)[0x7f15d2c8aa57] > /lib64/libc.so.6(+0x10bc10)[0x7f15d2c88c10] > /lib64/libc.so.6(+0x10b32b)[0x7f15d2c8832b] > /lib64/libc.so.6(__snprintf_chk+0x78)[0x7f15d2c88248] > /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_volume_defrag_restart+0x191)[0x7f15c9053931] > /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_restart_rebalance+0x82)[0x7f15c9059aa2] > /usr/lib64/glusterfs/3.7.1/xlator/mgmt/glusterd.so(glusterd_spawn_daemons+0x4f)[0x7f15c9059b1f] > /lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x7f15d41fb482] > /lib64/libc.so.6(+0x470f0)[0x7f15d2bc40f0] > --------- > > some log in data-brick1-vol01.log > [2015-06-11 06:37:10.602714] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 > [2015-06-11 06:37:10.612919] W [socket.c:642:__socket_rwv] 0-glusterfs: readv on 192.168.0.62:24007 failed (Connection reset by peer) > [2015-06-11 06:37:10.613503] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7f1074730ee6] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f10744ff36e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f10744ff47e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7f1074500e0c] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f10745015c8] ))))) 0-glusterfs: forced unwinding frame type(GlusterFS Handshake) op(GETSPEC(2)) called at 2015-06-11 06:37:10.602886 (xid=0x1) > [2015-06-11 06:37:10.613550] E [glusterfsd-mgmt.c:1604:mgmt_getspec_cbk] 0-mgmt: failed to fetch volume file (key:vol01.gwgfs02.data-brick1-vol01) > [2015-06-11 06:37:10.613599] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-: received signum (0), shutting down > [2015-06-11 06:37:10.618382] I [socket.c:3358:socket_submit_request] 0-glusterfs: not connected (priv->connected = 0) > [2015-06-11 06:37:10.618406] W [rpc-clnt.c:1566:rpc_clnt_submit] 0-glusterfs: failed to submit rpc-request (XID: 0x2 Program: Gluster Portmap, ProgVers: 1, Proc: 5) to rpc-transport (glusterfs) > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-users > -- ~Atin _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users