Re: lttng duplicate registration problem when using librados2 and libradosstriper

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It looks like the issue you are experiencing was fixed in the Infernalis/master branches [1].  I've opened a new tracker ticket to backport the fix to Hammer [2].

-- 

Jason Dillaman 

[1] https://github.com/sponce/ceph/commit/e4c27d804834b4a8bc495095ccf5103f8ffbcc1e
[2] http://tracker.ceph.com/issues/13210

----- Original Message -----
> From: "Paul Mansfield" <paul.mansfield@xxxxxxxxxxxxxxxxxx>
> To: "Jason Dillaman" <dillaman@xxxxxxxxxx>
> Cc: ceph-users@xxxxxxxxxxxxxx
> Sent: Wednesday, September 23, 2015 6:25:36 AM
> Subject: Re:  lttng duplicate registration problem when using librados2 and libradosstriper
> 
> On 22/09/15 19:48, Jason Dillaman wrote:
> > It's not the best answer, but it is the reason why it is currently
> > disabled on RHEL 7.  Best bet for finding a long-term solution is
> > still probably attaching with gdb and catching the abort function
> > call.  Once the offending probe can be found, we can figure out how to
> fix it.
> 
> I tried gdb and strace. I didn't find anything that gave me any insight.
> 
> Here's running it with gdb. I've not used gdb in anger in years, so
> quite possibly I'm doing it wrongly
> 
> $ gdb ./testprogram
> GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /foo/bar/testprogram...done.
> (gdb) handle SIGABRT stop nopass
> Signal        Stop      Print   Pass to program Description
> SIGABRT       Yes       Yes     No              Aborted
> (gdb) start
> Temporary breakpoint 1 at 0x4017ac: file testprogram, line 184.
> Starting program: /foo/bar/testprogram
> [Thread debugging using libthread_db enabled]
> [New Thread 0x7fffed9da700 (LWP 53014)]
> [New Thread 0x7fffed1d9700 (LWP 53015)]
> LTTng-UST: Error (-17) while registering tracepoint probe. Duplicate
> registration of tracepoint probes having the same name is not allowed.
> 
> Program received signal SIGABRT, Aborted.
> 0x00007ffff24b8925 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install
> CUnit-2.1.2-6.el6.x86_64 boost-system-1.41.0-18.el6.x86_64
> boost-thread-1.41.0-18.el6.x86_64 cassandra-cpp-driver-2.0.1-1.el6.amd64
> glibc-2.12-1.132.el6.x86_64 keyutils-libs-1.4-4.el6.x86_64
> krb5-libs-1.10.3-15.el6_5.1.x86_64 libcom_err-1.41.12-18.el6.x86_64
> libgcc-4.4.7-4.el6.x86_64 librados2-0.94.3-0.el6.x86_64
> libradosstriper1-0.94.3-0.el6.x86_64
> libselinux-2.0.94-5.3.el6_4.1.x86_64 libstdc++-4.4.7-4.el6.x86_64
> libuuid-2.17.2-12.14.el6.x86_64 libuv-1.2.1-1.el6.x86_64
> lttng-ust-2.4.1-1.el6.x86_64 nspr-4.10.2-1.el6_5.x86_64
> nss-3.15.3-6.el6_5.x86_64 nss-util-3.15.3-1.el6_5.x86_64
> openssl-1.0.1e-16.el6_5.7.x86_64 userspace-rcu-0.7.7-1.el6.x86_64
> zlib-1.2.3-29.el6.x86_64
> (gdb) backtrace
> #0  0x00007ffff24b8925 in raise () from /lib64/libc.so.6
> #1  0x00007ffff24ba105 in abort () from /lib64/libc.so.6
> #2  0x00007ffff58c58f4 in ?? () from /usr/lib64/librados.so.2
> #3  0x00007ffff58f4936 in ?? () from /usr/lib64/librados.so.2
> #4  0x00007fffffffe9a8 in ?? ()
> #5  0x0000000000000001 in ?? ()
> #6  0x00007fffffffe9a8 in ?? ()
> #7  0x00007ffff555f51b in _init () from /usr/lib64/librados.so.2
> #8  0x00007ffff7fea000 in ?? ()
> #9  0x00007ffff7deb555 in _dl_init_internal () from
> /lib64/ld-linux-x86-64.so.2
> #10 0x00007ffff7dddb3a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
> #11 0x0000000000000001 in ?? ()
> #12 0x00007fffffffec44 in ?? ()
> #13 0x0000000000000000 in ?? ()
> 
> 
> 
> This didn't tell me much. I tried using "nm" on the librados and
> libradosstriper libraries and there was no symbol information.
> 
> 
> I also tried strace which revealed two sub processes
> 
> $ grep "/dev/shm/lttng-ust" strace.out
> [pid 49682] open("/dev/shm/lttng-ust-wait-5",
> O_RDONLY|O_NOFOLLOW|O_CLOEXEC) = 3
> [pid 49683] open("/dev/shm/lttng-ust-wait-5-2489",
> O_RDONLY|O_NOFOLLOW|O_CLOEXEC) = 3
> 
> 
> 
> $ grep "pid 49682" strace.out | more
> [pid 49682] set_robust_list(0x7fe69cb5b9e0, 0x18 <unfinished ...>
> [pid 49682] <... set_robust_list resumed> ) = 0
> [pid 49682] socket(PF_FILE, SOCK_STREAM, 0 <unfinished ...>
> [pid 49682] <... socket resumed> )      = 3
> [pid 49682] fcntl(3, F_SETFD, FD_CLOEXECProcess 49683 attached
> [pid 49682] connect(3, {sa_family=AF_FILE,
> path="/var/run/lttng/lttng-ust-sock-5"}, 110 <unfinished ...>
> [pid 49682] <... connect resumed> )     = -1 ENOENT (No such file or
> directory)
> [pid 49682] close(3 <unfinished ...>
> [pid 49682] <... close resumed> )       = 0
> [pid 49682] statfs("/dev/shm/",  <unfinished ...>
> [pid 49682] <... statfs resumed> {f_type=0x1021994, f_bsize=4096,
> f_blocks=8242437, f_bfree=8242435, f_bavail=8242435, f_files=8242437,
> f_ffree=8242434, f_fsid={0, 0}, f_n
> amelen=255, f_frsize=4096}) = 0
> [pid 49682] futex(0x7fe69fd6b300, FUTEX_WAKE_PRIVATE, 2147483647) = 0
> [pid 49682] open("/dev/shm/lttng-ust-wait-5",
> O_RDONLY|O_NOFOLLOW|O_CLOEXEC) = 3
> [pid 49682] fcntl(3, F_GETFD)           = 0x1 (flags FD_CLOEXEC)
> [pid 49682] read(3, "\0\0\0\0", 4)      = 4
> [pid 49682] mmap(NULL, 4096, PROT_READ, MAP_SHARED, 3, 0) = 0x7fe6a717b000
> [pid 49682] close(3)                    = 0
> [pid 49682] futex(0x7fe6a13f15e0, FUTEX_WAKE_PRIVATE, 1) = 1
> [pid 49682] futex(0x7fe6a717b000, FUTEX_WAIT, 0, NULL <unfinished ...>
> [pid 49682] +++ killed by SIGABRT (core dumped) +++
> 
> 
> $ grep "pid 49683" strace.out | more
> [pid 49683] set_robust_list(0x7fe69c35a9e0, 0x18 <unfinished ...>
> [pid 49683] <... set_robust_list resumed> ) = 0
> [pid 49683] socket(PF_FILE, SOCK_STREAM, 0 <unfinished ...>
> [pid 49683] <... socket resumed> )      = 4
> [pid 49683] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
> [pid 49683] connect(4, {sa_family=AF_FILE,
> path="/p4/.lttng/lttng-ust-sock-5"}, 110 <unfinished ...>
> [pid 49683] <... connect resumed> )     = -1 ENOENT (No such file or
> directory)
> [pid 49683] close(4 <unfinished ...>
> [pid 49683] <... close resumed> )       = 0
> [pid 49683] futex(0x7fe6a13f15e0, FUTEX_WAIT_PRIVATE, 2, NULL
> <unfinished ...>
> [pid 49683] <... futex resumed> )       = 0
> [pid 49683] futex(0x7fe6a13f1620, FUTEX_WAKE_PRIVATE, 1) = 1
> [pid 49683] futex(0x7fe6a13f15e0, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid 49683] open("/dev/shm/lttng-ust-wait-5-2489",
> O_RDONLY|O_NOFOLLOW|O_CLOEXEC) = 3
> [pid 49683] read(3, "\0\0\0\0", 4)      = 4
> [pid 49683] fstat(3, {st_mode=S_IFREG|0640, st_size=4096, ...}) = 0
> [pid 49683] getuid( <unfinished ...>
> [pid 49683] <... getuid resumed> )      = 2489
> [pid 49683] mmap(NULL, 4096, PROT_READ, MAP_SHARED, 3, 0) = 0x7fe6a717a000
> [pid 49683] close(3)                    = 0
> [pid 49683] futex(0x7fe6a13f15e0, FUTEX_WAKE_PRIVATE, 1) = 1
> [pid 49683] futex(0x7fe6a717a000, FUTEX_WAIT, 0, NULL <unfinished ...>
> [pid 49683] +++ killed by SIGABRT (core dumped) +++
> 
> 
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux