Hi List!
I have tracked down the bad commit to
de640d85fa3e0e5e5a31704eab5a8714a1ffe867.
I have also created a patch that fixes this error on my test cluster.
I am attaching it here for peer-review.
---
Thanks,
Dyweni
On Sat, 14 May 2011 19:17:42 -0500, Dyweni - Ceph-Devel wrote:
Hi List!
When creating a brand new cluster, I get the following segmentation
fault:
=== osd.2 ===
pushing conf and monmap to ceph2
Warning: Permanently added 'ceph2' (ECDSA) to the list of known
hosts.
umount: /data/osd2: not mounted
umount: /dev/sda: not mounted
WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org [1] before using
fs created label (null) on /dev/sda
nodesize 4096 leafsize 4096 sectorsize 4096 size 74.53GB
Btrfs Btrfs v0.19
Scanning for Btrfs filesystems
** WARNING: Ceph is still under development. Any feedback can be
directed **
** at ceph-devel@xxxxxxxxxxxxxxx [2] or
http://ceph.newdream.net/ [3]. **
*** Caught signal (Segmentation fault) **
in thread 0xb70f2b30
ceph version 0.27.1-401-g6af0379
(commit:6af0379e27ac71a7abd8c9ebb0145ae8b9f66cc4)
1: (ceph::BackTrace::BackTrace(int)+0x1f) [0x8465fcf]
2: /usr/bin/cosd() [0x84d8844]
3: [0xb77f1400]
4: (pthread_spin_lock()+0x6) [0xb77c38d6]
5: (ceph::Spinlock::lock()+0x20) [0x82e42e8]
6: (ceph::atomic_t::dec()+0x12) [0x82e4418]
7: (RefCountedObject::put()+0x15) [0x82e48d9]
8: (MonClient::get_monmap_privately()+0x5f2) [0x84c81ec]
9: (main()+0x976) [0x82e0cce]
10: (__libc_start_main()+0xd9) [0xb7109ba9]
11: /usr/bin/cosd() [0x82e0101]
/usr/sbin/mkcephfs: line 239: 859 Segmentation fault (core
dumped) $BINDIR/cosd -c $conf --monmap $dir/monmap -i $id --mkfs
failed: 'ssh ceph2 /usr/sbin/mkcephfs -d /tmp/mkcephfs.6ySmaVjdFm
--init-daemon osd.2'
Here is the GDB backtrace:
(gdb) bt
#0 0xb77c6d6f in raise () from /lib/libpthread.so.0
#1 0x084d870f in reraise_fatal (signum=11) at common/signal.cc:63
#2 0x084d88ce in handle_fatal_signal (signum=11) at
common/signal.cc:110
#3
#4 0xb77c38d6 in pthread_spin_lock () from /lib/libpthread.so.0
#5 0x082e42e8 in ceph::Spinlock::lock (this=0x4) at
include/Spinlock.h:97
#6 0x082e4418 in ceph::atomic_t::dec (this=0x4) at
include/atomic.h:75
#7 0x082e48d9 in RefCountedObject::put (this=0x0) at
msg/Message.h:160
#8 0x084c81ec in MonClient::get_monmap_privately (this=0xbf81baf4) at
mon/MonClient.cc:230
#9 0x082e0cce in main (argc=8, argv=0xbf81c1f4) at cosd.cc:130
My kernel is:
Linux version 2.6.39-rc7-git5-20110514-0905 (root@phenom) (gcc
version
4.4.5 (Gentoo 4.4.5 p1.2, pie-0.4.5) ) #1 SMP Sat May 14 09:07:07 CDT
2011
--
Thanks,
Dyweni
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel"
in
the body of a message to majordomo@xxxxxxxxxxxxxxx [4]
More majordomo info at http://vger.kernel.org/majordomo-info.html [5]
From acf86f21d3c11e8edd82692a4fa27a5b88c538b0 Mon Sep 17 00:00:00 2001
From: root <root@xxxxxxxxxxxxxxxxx>
Date: Sun, 15 May 2011 08:54:13 -0500
Subject: [PATCH] fix segfault introduced by commit de640d85fa3e0e5e5a31704eab5a8714a1ffe867
That commit introduces the line 'cur_con->put()' which has the possibility
of being called while cur_con is not initialized.
---
src/mon/MonClient.cc | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/src/mon/MonClient.cc b/src/mon/MonClient.cc
index 70e14e9..9707dfe 100644
--- a/src/mon/MonClient.cc
+++ b/src/mon/MonClient.cc
@@ -227,8 +227,10 @@ int MonClient::get_monmap_privately()
hunting = true; // reset this to true!
cur_mon.clear();
- cur_con->put();
- cur_con = NULL;
+ if (cur_con) {
+ cur_con->put();
+ cur_con = NULL;
+ }
if (monmap.epoch)
return 0;
--
1.7.3.4