Re: MDS crash, wont startup again

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

i was using the Debian Packages, but i tried now from source.
I used the same version from GIT
(cb7f1c9c7520848b0899b26440ac34a8acea58d1) and compiled it. Same crash
report.
Then i applied your patch but again the same crash, i think the
backtrace is also the same:

 (gdb) thread 1
[Switching to thread 1 (Thread 9564)]#0  0x00007f33a3e58ebb in raise
(sig=<value optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
41      in ../nptl/sysdeps/unix/sysv/linux/pt-raise.c
(gdb) backtrace
#0  0x00007f33a3e58ebb in raise (sig=<value optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
#1  0x000000000081423e in reraise_fatal (signum=11) at
global/signal_handler.cc:58
#2  handle_fatal_signal (signum=11) at global/signal_handler.cc:104
#3  <signal handler called>
#4  SnapRealm::have_past_parents_open (this=0x0, first=..., last=...)
at mds/snap.cc:112
#5  0x000000000055d58b in MDCache::check_realm_past_parents
(this=0x27a7200, realm=0x0)
    at mds/MDCache.cc:4495
#6  0x0000000000572eec in
MDCache::choose_lock_states_and_reconnect_caps (this=0x27a7200)
    at mds/MDCache.cc:4533
#7  0x00000000005931a0 in MDCache::rejoin_gather_finish
(this=0x27a7200) at mds/MDCache.cc:4444
#8  0x000000000059b9d5 in MDCache::rejoin_send_rejoins
(this=0x27a7200) at mds/MDCache.cc:3388
#9  0x00000000004a8721 in MDS::rejoin_joint_start (this=0x27bc000) at
mds/MDS.cc:1404
#10 0x00000000004c253a in MDS::handle_mds_map (this=0x27bc000,
m=<value optimized out>)
    at mds/MDS.cc:968
#11 0x00000000004c4513 in MDS::handle_core_message (this=0x27bc000,
m=0x27ab800) at mds/MDS.cc:1651
#12 0x00000000004c45ef in MDS::_dispatch (this=0x27bc000, m=0x27ab800)
at mds/MDS.cc:1790
#13 0x00000000004c628b in MDS::ms_dispatch (this=0x27bc000,
m=0x27ab800) at mds/MDS.cc:1602
#14 0x0000000000732609 in Messenger::ms_deliver_dispatch
(this=0x279f680) at msg/Messenger.h:178
#15 SimpleMessenger::dispatch_entry (this=0x279f680) at
msg/SimpleMessenger.cc:363
#16 0x00000000007207ad in SimpleMessenger::DispatchThread::entry() ()
#17 0x00007f33a3e508ca in start_thread (arg=<value optimized out>) at
pthread_create.c:300
#18 0x00007f33a26d892d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#19 0x0000000000000000 in ?? ()

Any more ideas? :)
Or can i get you more debugging output?



2012/5/23 Gregory Farnum <greg@xxxxxxxxxxx>:
> On Wed, May 23, 2012 at 5:28 AM, Felix Feinhals
> <ff@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
>> Hey,
>>
>> ok i installed libc-dbg and run your commands now this comes up:
>>
>> gdb /usr/bin/ceph-mds core
>>
>> snip
>>
>> GNU gdb (GDB) 7.0.1-debian
>> Copyright (C) 2009 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-linux-gnu".
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>...
>> Reading symbols from /usr/bin/ceph-mds...Reading symbols from
>> /usr/lib/debug/usr/bin/ceph-mds...done.
>> (no debugging symbols found)...done.
>> [New Thread 22980]
>> [New Thread 22984]
>> [New Thread 22986]
>> [New Thread 22979]
>> [New Thread 22970]
>> [New Thread 22981]
>> [New Thread 22971]
>> [New Thread 22976]
>> [New Thread 22973]
>> [New Thread 22975]
>> [New Thread 22974]
>> [New Thread 22972]
>> [New Thread 22978]
>> [New Thread 22982]
>>
>> warning: Can't read pathname for load map: Input/output error.
>> Reading symbols from /lib/libpthread.so.0...Reading symbols from
>> /usr/lib/debug/lib/libpthread-2.11.3.so...done.
>> (no debugging symbols found)...done.
>> Loaded symbols for /lib/libpthread.so.0
>> Reading symbols from /usr/lib/libcrypto++.so.8...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib/libcrypto++.so.8
>> Reading symbols from /lib/libuuid.so.1...(no debugging symbols found)...done.
>> Loaded symbols for /lib/libuuid.so.1
>> Reading symbols from /lib/librt.so.1...Reading symbols from
>> /usr/lib/debug/lib/librt-2.11.3.so...done.
>> (no debugging symbols found)...done.
>> Loaded symbols for /lib/librt.so.1
>> Reading symbols from /usr/lib/libtcmalloc.so.0...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib/libtcmalloc.so.0
>> Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib/libstdc++.so.6
>> Reading symbols from /lib/libm.so.6...Reading symbols from
>> /usr/lib/debug/lib/libm-2.11.3.so...done.
>> (no debugging symbols found)...done.
>> Loaded symbols for /lib/libm.so.6
>> Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
>> Loaded symbols for /lib/libgcc_s.so.1
>> Reading symbols from /lib/libc.so.6...Reading symbols from
>> /usr/lib/debug/lib/libc-2.11.3.so...done.
>> (no debugging symbols found)...done.
>> Loaded symbols for /lib/libc.so.6
>> Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols
>> from /usr/lib/debug/lib/ld-2.11.3.so...done.
>> (no debugging symbols found)...done.
>> Loaded symbols for /lib64/ld-linux-x86-64.so.2
>> Reading symbols from /usr/lib/libunwind.so.7...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib/libunwind.so.7
>> Core was generated by `/usr/bin/ceph-mds -i c --pid-file
>> /var/run/ceph/mds.c.pid -c /etc/ceph/ceph.con'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x00007f10c00d2ebb in raise (sig=<value optimized out>) at
>> ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
>> 41      ../nptl/sysdeps/unix/sysv/linux/pt-raise.c: No such file or directory.
>>        in ../nptl/sysdeps/unix/sysv/linux/pt-raise.c
>>
>> snip
>>
>> Now
>>
>> thread apply all bt
>>
>> ...
>>
>> thread 1
>> [Switching to thread 1 (Thread 22977)]#0  0x00007f10c00d2ebb in raise
>> (sig=<value optimized out>) at
>> ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
>> 41      in ../nptl/sysdeps/unix/sysv/linux/pt-raise.c
>>
>>
>> Thread 1 (Thread 22977):
>> ---Type <return> to continue, or q <return> to quit---
>> #0  0x00007f10c00d2ebb in raise (sig=<value optimized out>) at
>> ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
>> #1  0x000000000081469e in reraise_fatal (signum=11) at
>> global/signal_handler.cc:58
>> #2  handle_fatal_signal (signum=11) at global/signal_handler.cc:104
>> #3  <signal handler called>
>> #4  SnapRealm::have_past_parents_open (this=0x0, first=..., last=...)
>> at mds/snap.cc:112
>>
>> #5  0x000000000055d58b in MDCache::check_realm_past_parents
>> (this=0x2b49200, realm=0x0) at mds/MDCache.cc:4495
>> #6  0x0000000000572eec in
>> MDCache::choose_lock_states_and_reconnect_caps (this=0x2b49200) at
>> mds/MDCache.cc:4533
>> #7  0x00000000005931a0 in MDCache::rejoin_gather_finish
>> (this=0x2b49200) at mds/MDCache.cc:4444
>> #8  0x000000000059b9d5 in MDCache::rejoin_send_rejoins
>> (this=0x2b49200) at mds/MDCache.cc:3388
>> #9  0x00000000004a8721 in MDS::rejoin_joint_start (this=0x2b5e000) at
>> mds/MDS.cc:1404
>> #10 0x00000000004c253a in MDS::handle_mds_map (this=0x2b5e000,
>> m=<value optimized out>) at mds/MDS.cc:968
>> #11 0x00000000004c4513 in MDS::handle_core_message (this=0x2b5e000,
>> m=0x2b4d800) at mds/MDS.cc:1651
>> #12 0x00000000004c45ef in MDS::_dispatch (this=0x2b5e000, m=0x2b4d800)
>> at mds/MDS.cc:1790
>> #13 0x00000000004c628b in MDS::ms_dispatch (this=0x2b5e000,
>> m=0x2b4d800) at mds/MDS.cc:1602
>> #14 0x00000000007acb49 in Messenger::ms_deliver_dispatch
>> (this=0x2b41680) at msg/Messenger.h:178
>> #15 SimpleMessenger::dispatch_entry (this=0x2b41680) at
>> msg/SimpleMessenger.cc:363
>> #16 0x00000000007336ed in SimpleMessenger::DispatchThread::entry() ()
>> #17 0x00007f10c00ca8ca in start_thread (arg=<value optimized out>) at
>> pthread_create.c:300
>> #18 0x00007f10be95292d in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>> #19 0x0000000000000000 in ?? ()
>>
>> So i wonder is the crash because of the missing file message?
>
> Okay, that is what I wanted. It looks like it can't find the
> snaprealm, and I have a pretty good guess why.
> If you're building your own binaries, you can apply the patch below
> and I bet things will work. (Let me know if they do or don't!)
> -Greg
>
>
> diff --git a/src/mds/CInode.cc b/src/mds/CInode.cc
> index 70faeb8..becccf5 100644
> --- a/src/mds/CInode.cc
> +++ b/src/mds/CInode.cc
> @@ -2130,7 +2130,7 @@ SnapRealm *CInode::find_snaprealm()
>   while (!cur->snaprealm) {
>     if (cur->get_parent_dn())
>       cur = cur->get_parent_dn()->get_dir()->get_inode();
> -    else if (get_projected_parent_dn())
> +    else if (cur->get_projected_parent_dn())
>       cur = cur->get_projected_parent_dn()->get_dir()->get_inode();
>     else
>       break;
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux