NFS crashes on Gluster 3.4.0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

since we switched to NFS(due to many small files) we are experiencing heavy problems with Glusters NFS daemon. About once a day, the Gluster NFS process just crashes on one of the machines and doesn't come up again until I issue a restart of the Gluster daemon on that node. Sometimes the crashed node will even crash again after the restart.

We have a ~2TB volume with 6 bricks on 5 servers, accessed by 12 NFS clients and one FUSE client.

In the nfs logs there's something like the following:

tail -n 100 /var/log/glusterfs/nfs.log
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
[...]
frame : type(0) op(0)

signal received: 11
time of crash: 2013-08-15 14:08:39
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0
/lib/x86_64-linux-gnu/libc.so.6(+0x364c0)[0x7fac361904c0]
/lib/x86_64-linux-gnu/libpthread.so.0(pthread_spin_lock+0x0)[0x7fac36523a50]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(fd_unref+0x36)[0x7fac36b96966]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/xlator/protocol/client.so(client_local_wipe+0x28)[0x7fac31f6a4f8]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/xlator/protocol/client.so(client3_3_opendir_cbk+0x19c)[0x7fac31f8353c]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x7fac36957bd5]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xc5)[0x7fac36957f35]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x27)[0x7fac36954627]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/rpc-transport/socket.so(+0xa1d1)[0x7fac32e091d1]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/rpc-transport/socket.so(+0xa81c)[0x7fac32e0981c]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x5e553)[0x7fac36bbd553]
/usr/sbin/glusterfs(main+0x3e3)[0x7fac37007883]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7fac3617b76d]
/usr/sbin/glusterfs(+0x5c79)[0x7fac37007c79]
---------




Is there anything we could do to prevent this or at least something to find the cause of this? At the moment we have the ugly workaround to check the NFS status via cron and restart the server if necessary but that's nothing we find suitable for larger deployments..


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux