Re: MDS crash, wont startup again

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 23, 2012 at 5:28 AM, Felix Feinhals
<ff@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
> Hey,
>
> ok i installed libc-dbg and run your commands now this comes up:
>
> gdb /usr/bin/ceph-mds core
>
> snip
>
> GNU gdb (GDB) 7.0.1-debian
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /usr/bin/ceph-mds...Reading symbols from
> /usr/lib/debug/usr/bin/ceph-mds...done.
> (no debugging symbols found)...done.
> [New Thread 22980]
> [New Thread 22984]
> [New Thread 22986]
> [New Thread 22979]
> [New Thread 22970]
> [New Thread 22981]
> [New Thread 22971]
> [New Thread 22976]
> [New Thread 22973]
> [New Thread 22975]
> [New Thread 22974]
> [New Thread 22972]
> [New Thread 22978]
> [New Thread 22982]
>
> warning: Can't read pathname for load map: Input/output error.
> Reading symbols from /lib/libpthread.so.0...Reading symbols from
> /usr/lib/debug/lib/libpthread-2.11.3.so...done.
> (no debugging symbols found)...done.
> Loaded symbols for /lib/libpthread.so.0
> Reading symbols from /usr/lib/libcrypto++.so.8...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libcrypto++.so.8
> Reading symbols from /lib/libuuid.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib/libuuid.so.1
> Reading symbols from /lib/librt.so.1...Reading symbols from
> /usr/lib/debug/lib/librt-2.11.3.so...done.
> (no debugging symbols found)...done.
> Loaded symbols for /lib/librt.so.1
> Reading symbols from /usr/lib/libtcmalloc.so.0...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libtcmalloc.so.0
> Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libstdc++.so.6
> Reading symbols from /lib/libm.so.6...Reading symbols from
> /usr/lib/debug/lib/libm-2.11.3.so...done.
> (no debugging symbols found)...done.
> Loaded symbols for /lib/libm.so.6
> Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib/libgcc_s.so.1
> Reading symbols from /lib/libc.so.6...Reading symbols from
> /usr/lib/debug/lib/libc-2.11.3.so...done.
> (no debugging symbols found)...done.
> Loaded symbols for /lib/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols
> from /usr/lib/debug/lib/ld-2.11.3.so...done.
> (no debugging symbols found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /usr/lib/libunwind.so.7...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libunwind.so.7
> Core was generated by `/usr/bin/ceph-mds -i c --pid-file
> /var/run/ceph/mds.c.pid -c /etc/ceph/ceph.con'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x00007f10c00d2ebb in raise (sig=<value optimized out>) at
> ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
> 41      ../nptl/sysdeps/unix/sysv/linux/pt-raise.c: No such file or directory.
>        in ../nptl/sysdeps/unix/sysv/linux/pt-raise.c
>
> snip
>
> Now
>
> thread apply all bt
>
> ...
>
> thread 1
> [Switching to thread 1 (Thread 22977)]#0  0x00007f10c00d2ebb in raise
> (sig=<value optimized out>) at
> ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
> 41      in ../nptl/sysdeps/unix/sysv/linux/pt-raise.c
>
>
> Thread 1 (Thread 22977):
> ---Type <return> to continue, or q <return> to quit---
> #0  0x00007f10c00d2ebb in raise (sig=<value optimized out>) at
> ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
> #1  0x000000000081469e in reraise_fatal (signum=11) at
> global/signal_handler.cc:58
> #2  handle_fatal_signal (signum=11) at global/signal_handler.cc:104
> #3  <signal handler called>
> #4  SnapRealm::have_past_parents_open (this=0x0, first=..., last=...)
> at mds/snap.cc:112
>
> #5  0x000000000055d58b in MDCache::check_realm_past_parents
> (this=0x2b49200, realm=0x0) at mds/MDCache.cc:4495
> #6  0x0000000000572eec in
> MDCache::choose_lock_states_and_reconnect_caps (this=0x2b49200) at
> mds/MDCache.cc:4533
> #7  0x00000000005931a0 in MDCache::rejoin_gather_finish
> (this=0x2b49200) at mds/MDCache.cc:4444
> #8  0x000000000059b9d5 in MDCache::rejoin_send_rejoins
> (this=0x2b49200) at mds/MDCache.cc:3388
> #9  0x00000000004a8721 in MDS::rejoin_joint_start (this=0x2b5e000) at
> mds/MDS.cc:1404
> #10 0x00000000004c253a in MDS::handle_mds_map (this=0x2b5e000,
> m=<value optimized out>) at mds/MDS.cc:968
> #11 0x00000000004c4513 in MDS::handle_core_message (this=0x2b5e000,
> m=0x2b4d800) at mds/MDS.cc:1651
> #12 0x00000000004c45ef in MDS::_dispatch (this=0x2b5e000, m=0x2b4d800)
> at mds/MDS.cc:1790
> #13 0x00000000004c628b in MDS::ms_dispatch (this=0x2b5e000,
> m=0x2b4d800) at mds/MDS.cc:1602
> #14 0x00000000007acb49 in Messenger::ms_deliver_dispatch
> (this=0x2b41680) at msg/Messenger.h:178
> #15 SimpleMessenger::dispatch_entry (this=0x2b41680) at
> msg/SimpleMessenger.cc:363
> #16 0x00000000007336ed in SimpleMessenger::DispatchThread::entry() ()
> #17 0x00007f10c00ca8ca in start_thread (arg=<value optimized out>) at
> pthread_create.c:300
> #18 0x00007f10be95292d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
> #19 0x0000000000000000 in ?? ()
>
> So i wonder is the crash because of the missing file message?

Okay, that is what I wanted. It looks like it can't find the
snaprealm, and I have a pretty good guess why.
If you're building your own binaries, you can apply the patch below
and I bet things will work. (Let me know if they do or don't!)
-Greg


diff --git a/src/mds/CInode.cc b/src/mds/CInode.cc
index 70faeb8..becccf5 100644
--- a/src/mds/CInode.cc
+++ b/src/mds/CInode.cc
@@ -2130,7 +2130,7 @@ SnapRealm *CInode::find_snaprealm()
   while (!cur->snaprealm) {
     if (cur->get_parent_dn())
       cur = cur->get_parent_dn()->get_dir()->get_inode();
-    else if (get_projected_parent_dn())
+    else if (cur->get_projected_parent_dn())
       cur = cur->get_projected_parent_dn()->get_dir()->get_inode();
     else
       break;
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux