On 05/29/2015 01:29 PM, Félix de Lelelis wrote: > Hi, > > I have a cluster with 3 nodes on pre-production. Yesterday, one node was > down. The errror that I have seen is that: > > > [2015-05-28 19:04:27.305560] E [glusterd-syncop.c:1578:gd_sync_task_begin] > 0-management: Unable to acquire lock for cfe-gv1 > The message "I [MSGID: 106006] > [glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management: nfs > has disconnected from glusterd." repeated 5 times between [2015-05-28 > 19:04:09.346088] and [2015-05-28 19:04:24.349191] > pending frames: > frame : type(0) op(0) > patchset: git://git.gluster.com/glusterfs.git > signal received: 11 > time of crash: > 2015-05-28 19:04:27 > configuration details: > argp 1 > backtrace 1 > dlfcn 1 > libpthread 1 > llistxattr 1 > setfsid 1 > spinlock 1 > epoll.h 1 > xattr.h 1 > st_atim.tv_nsec 1 > package-string: glusterfs 3.6.1 > /usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7fd86e2f1232] > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x32d)[0x7fd86e30871d] > /usr/lib64/libc.so.6(+0x35640)[0x7fd86d30c640] > /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_remove_pending_entry+0x2c)[0x7fd85f52450c] > /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(+0x5ae28)[0x7fd85f511e28] > /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_op_sm+0x237)[0x7fd85f50f027] > /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(__glusterd_brick_op_cbk+0x2fe)[0x7fd85f53be5e] > /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_big_locked_cbk+0x4c)[0x7fd85f53d48c] > /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7fd86e0c50b0] > /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x171)[0x7fd86e0c5321] > /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fd86e0c1273] > /usr/lib64/glusterfs/3.6.1/rpc-transport/socket.so(+0x8530)[0x7fd85d17d530] > /usr/lib64/glusterfs/3.6.1/rpc-transport/socket.so(+0xace4)[0x7fd85d17fce4] > /usr/lib64/libglusterfs.so.0(+0x76322)[0x7fd86e346322] > /usr/sbin/glusterd(main+0x502)[0x7fd86e79afb2] > /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fd86d2f8af5] > /usr/sbin/glusterd(+0x6351)[0x7fd86e79b351] > --------- > > > That is a problem with software? is a bug ? The problem what I see here is concurrent volume status transactions were run at a given point of time (From the cmd log history in BZ 1226254). 3.6.1 has some fixes missing to take care of these issues identified on the same line. If you upgrade your cluster to 3.6.3 problem will go away. However 3.6.3 still misses one more fix http://review.gluster.org/#/c/10023/ which will be released in 3.6.4. I would request you to upgrade your cluster to 3.6.3 if not 3.7. > > Thanks. > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-users > -- ~Atin _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users