Re: Gluster server crashes with signal 11 after probing peers.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Great, upgrading to 3.7.10 did indeed fix this issue.

On 2016-03-31 21:07, Atin Mukherjee wrote:
On 03/31/2016 11:18 PM, Ernie Dunbar wrote:
Oops. I replied to Mohammed and not the whole list. Here's the backtrace
and the full backtrace too:

root@nfs3:/home/ernied# gdb /usr/sbin/glusterd /core
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/glusterd...(no debugging symbols
found)...done.

warning: core file may not match specified executable file.
[New LWP 1519]
[New LWP 1520]
[New LWP 1516]
[New LWP 1780]
[New LWP 1518]
[New LWP 1517]
[New LWP 1781]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 pthread_spin_lock () at ../nptl/sysdeps/x86_64/pthread_spin_lock.S:24
24    ../nptl/sysdeps/x86_64/pthread_spin_lock.S: No such file or
directory.

(gdb) bt

#0 pthread_spin_lock () at ../nptl/sysdeps/x86_64/pthread_spin_lock.S:24
#1  0x00007fb81dee520d in __gf_free () from
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#2  0x00007fb81deaa625 in data_destroy () from
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#3  0x00007fb81dead1cd in dict_get_str () from
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#4  0x00007fb8193f52f9 in glusterd_xfer_cli_probe_resp ()
from /usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/glusterd.so
#5  0x00007fb8193f6017 in __glusterd_handle_cli_probe ()
from /usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/glusterd.so
#6  0x00007fb8193ee9a0 in glusterd_big_locked_handler ()
from /usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/glusterd.so
#7  0x00007fb81def38d2 in synctask_wrap () from
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#8  0x00007fb81d2c38b0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#9  0x0000000000000000 in ?? ()

(gdb) bt full

#0 pthread_spin_lock () at ../nptl/sysdeps/x86_64/pthread_spin_lock.S:24
No locals.
#1  0x00007fb81dee520d in __gf_free () from
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
No symbol table info available.
#2  0x00007fb81deaa625 in data_destroy () from
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
No symbol table info available.
#3  0x00007fb81dead1cd in dict_get_str () from
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
No symbol table info available.
#4  0x00007fb8193f52f9 in glusterd_xfer_cli_probe_resp ()
from /usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/glusterd.so
No symbol table info available.
#5  0x00007fb8193f6017 in __glusterd_handle_cli_probe ()
from /usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/glusterd.so
No symbol table info available.
#6  0x00007fb8193ee9a0 in glusterd_big_locked_handler ()
from /usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/glusterd.so
No symbol table info available.
#7  0x00007fb81def38d2 in synctask_wrap () from
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
No symbol table info available.
#8  0x00007fb81d2c38b0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#9  0x0000000000000000 in ?? ()
No symbol table info available.
This looks like similar to BZ 1310677 and unfortunately the fix is
missed in 3.7.9 and would be available in 3.7.10.

~Atin


On 2016-03-30 23:15, Mohammed Rafi K C wrote:
Hi Ernie,

Can you please paste the back trace from the core file.

Regards
Rafi KC

On 03/31/2016 02:31 AM, Ernie Dunbar wrote:
Hi everyone.

I'm trying to add a new Gluster node to our cluster, and when trying
to probing the first node in the cluster, the new node crashes with
the following report (logs start when the daemon starts):

---------
[2016-03-30 20:32:05.191659] I [MSGID: 100030]
[glusterfsd.c:2332:main] 0-/usr/sbin/glusterd: Started running
/usr/sbin/glusterd version 3.7.9 (args: /usr/sbin/glusterd -p
/var/run/glusterd.pid)
[2016-03-30 20:32:05.195695] I [MSGID: 106478] [glusterd.c:1337:init]
0-management: Maximum allowed open file descriptors set to 65536
[2016-03-30 20:32:05.195752] I [MSGID: 106479] [glusterd.c:1386:init]
0-management: Using /var/lib/glusterd as working directory
[2016-03-30 20:32:05.200609] W [MSGID: 103071]
[rdma.c:4594:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
channel creation failed [No such device]
[2016-03-30 20:32:05.200648] W [MSGID: 103055] [rdma.c:4901:init]
0-rdma.management: Failed to initialize IB Device
[2016-03-30 20:32:05.200662] W
[rpc-transport.c:359:rpc_transport_load] 0-rpc-transport: 'rdma'
initialization failed
[2016-03-30 20:32:05.200723] W [rpcsvc.c:1597:rpcsvc_transport_create]
0-rpc-service: cannot create listener, initing the transport failed
[2016-03-30 20:32:05.200743] E [MSGID: 106243] [glusterd.c:1610:init]
0-management: creation of 1 listeners failed, continuing with
succeeded transport
[2016-03-30 20:32:07.135310] I [MSGID: 106513]
[glusterd-store.c:2062:glusterd_restore_op_version] 0-glusterd:
retrieved op-version: 30501
[2016-03-30 20:32:07.135775] I [MSGID: 106498]
[glusterd-handler.c:3640:glusterd_friend_add_from_peerinfo]
0-management: connect returned 0
[2016-03-30 20:32:07.135876] I
[rpc-clnt.c:984:rpc_clnt_connection_init] 0-management: setting
frame-timeout to 600
[2016-03-30 20:32:07.136651] W [socket.c:870:__socket_keepalive]
0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 13, Invalid
argument
[2016-03-30 20:32:07.136673] E [socket.c:2966:socket_connect]
0-management: Failed to set keep-alive: Invalid argument
[2016-03-30 20:32:07.136908] I [MSGID: 106194]
[glusterd-store.c:3523:glusterd_store_retrieve_missed_snaps_list]
0-management: No missed snaps list.
Final graph:
+------------------------------------------------------------------------------+


  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option rpc-auth-allow-insecure on
  7:     option transport.socket.listen-backlog 128
  8:     option event-threads 1
  9:     option ping-timeout 0
 10:     option transport.socket.read-fail-log off
 11:     option transport.socket.keepalive-interval 2
 12:     option transport.socket.keepalive-time 10
 13:     option transport-type rdma
 14:     option working-directory /var/lib/glusterd
 15: end-volume
 16:
+------------------------------------------------------------------------------+


[2016-03-30 20:32:07.138287] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
thread with index 1
[2016-03-30 20:32:07.138980] I [MSGID: 106544]
[glusterd.c:159:glusterd_uuid_init] 0-management: retrieved UUID:
ae191e96-9cd6-4e2b-acae-18f2cc45e6ed
[2016-03-30 20:32:07.139422] I [MSGID: 106163]
[glusterd-handshake.c:1194:__glusterd_mgmt_hndsk_versions_ack]
0-management: using the op-version 30501
[2016-03-30 20:32:14.394056] I [MSGID: 106487]
[glusterd-handler.c:1239:__glusterd_handle_cli_probe] 0-glusterd:
Received CLI probe req nfs1 24007
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2016-03-30 20:32:14
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.9
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x92)[0x7f0401a78562]


/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f0401a9464d]


/lib/x86_64-linux-gnu/libc.so.6(+0x36d40)[0x7f0400e76d40]
/lib/x86_64-linux-gnu/libpthread.so.0(pthread_spin_lock+0x0)[0x7f04012120f0]


---------


Both nodes are running GlusterFS 3.7.9 on Ubuntu Trusty Tahr (14.04
LTS). Node 1 is running Linux kernel 3.13.0-55-generic #94-Ubuntu SMP, and node 3 is running Linux kernel 3.13.0-77-generic #121-Ubuntu SMP.
To me, this seems to be the only difference between the systems,
although the new node has the very latest version of the Gluster
packages from the launchpad.net PPA. I would imagine that Node 1 has
the same update, but it's hard to tell.

Any help would be much appreciated.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users



[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux