Since upgrading to 0.56.3 i've seen the following crash: -46> 2013-02-14 13:28:47.274807 7f5b467d8700 0 log [INF] : 4.94a restarting backfill on osd.13 from (0'0,0'0] MAX to 16759'8569 -45> 2013-02-14 13:28:47.290994 7f5b44fd5700 0 log [INF] : 4.38b restarting backfill on osd.34 from (0'0,0'0] MAX to 16759'1035 -44> 2013-02-14 13:28:47.713665 7f5b447d4700 0 log [INF] : 4.daf restarting backfill on osd.53 from (0'0,0'0] MAX to 16759'2982 -43> 2013-02-14 13:28:50.139811 7f5b45fd7700 0 log [INF] : 4.996 restarting backfill on osd.11 from (0'0,0'0] MAX to 16710'4899 -42> 2013-02-14 13:28:50.531244 7f5b45fd7700 0 log [INF] : 4.383 restarting backfill on osd.12 from (0'0,0'0] MAX to 16685'1070 -41> 2013-02-14 13:28:50.546202 7f5b457d6700 -1 filestore(/ceph/osd.22/) could not find 9d20a9b1/rbd_data.195bbf6b8b4567.0000000000000840/78f//4 in index: (2) No such file or directory -40> 2013-02-14 13:28:50.546352 7f5b457d6700 -1 filestore(/ceph/osd.22/) could not find 396049b1/rbd_data.19a0ff6b8b4567.0000000000001e4c/4a9//4 in index: (2) No such file or directory -39> 2013-02-14 13:28:53.061214 7f5b46fd9700 -1 filestore(/ceph/osd.22/) could not find 1f6581e5/rbd_data.1bca886b8b4567.0000000000004a83/2b4//4 in index: (2) No such file or directory -38> 2013-02-14 13:28:54.115547 7f5b46fd9700 -1 filestore(/ceph/osd.22/) could not find a4c49703/rbd_data.1949406b8b4567.0000000000000140/512//4 in index: (2) No such file or directory -37> 2013-02-14 13:30:08.633422 7f5b477da700 -1 filestore(/ceph/osd.22/) could not find 6201453c/rbd_data.19653b6b8b4567.0000000000001e53/797//4 in index: (2) No such file or directory -36> 2013-02-14 13:30:08.633728 7f5b477da700 -1 filestore(/ceph/osd.22/) could not find 6201453c/rbd_data.19653b6b8b4567.0000000000001e53/797//4 in index: (2) No such file or directory -35> 2013-02-14 13:30:08.994236 7f5b477da700 -1 filestore(/ceph/osd.22/) could not find fe9a110f/rbd_data.195bbf6b8b4567.000000000000001e/78f//4 in index: (2) No such file or directory -34> 2013-02-14 13:30:08.994436 7f5b477da700 -1 filestore(/ceph/osd.22/) could not find ef19810f/rb.0.20b752.6b8b4567.0000000008bb/5d9//4 in index: (2) No such file or directory -33> 2013-02-14 13:30:08.994520 7f5b477da700 -1 filestore(/ceph/osd.22/) could not find e907910f/rbd_data.1949406b8b4567.000000000000197f/512//4 in index: (2) No such file or directory -32> 2013-02-14 13:30:09.313753 7f5b477da700 -1 filestore(/ceph/osd.22/) could not find ff2df9b3/rbd_data.1949406b8b4567.000000000000184b/512//4 in index: (2) No such file or directory -31> 2013-02-14 13:30:09.358139 7f5b44fd5700 -1 filestore(/ceph/osd.22/) could not find 15194ffd/rbd_data.19c45d6b8b4567.0000000000001eb3/82b//4 in index: (2) No such file or directory -30> 2013-02-14 13:30:09.446870 7f5b457d6700 -1 filestore(/ceph/osd.22/) could not find f20db15d/rbd_data.1949406b8b4567.00000000000011ae/512//4 in index: (2) No such file or directory -29> 2013-02-14 13:30:09.482069 7f5b46fd9700 -1 filestore(/ceph/osd.22/) could not find 9e73fa3b/rbd_data.195bbf6b8b4567.0000000000000814/78f//4 in index: (2) No such file or directory -28> 2013-02-14 13:30:09.482124 7f5b46fd9700 -1 filestore(/ceph/osd.22/) could not find 3b597a3b/rbd_data.1bca886b8b4567.0000000000003a17/2b4//4 in index: (2) No such file or directory -27> 2013-02-14 13:30:09.482175 7f5b46fd9700 -1 filestore(/ceph/osd.22/) could not find 9e290a3b/rbd_data.1bca886b8b4567.0000000000000c5f/28f//4 in index: (2) No such file or directory -26> 2013-02-14 13:30:09.824235 7f5b44fd5700 -1 filestore(/ceph/osd.22/) could not find 7439f3ad/rbd_data.195bbf6b8b4567.00000000000018c3/78f//4 in index: (2) No such file or directory -25> 2013-02-14 13:30:09.828177 7f5b44fd5700 -1 filestore(/ceph/osd.22/) could not find fc7533ad/rbd_data.1bca886b8b4567.0000000000003aba/2b4//4 in index: (2) No such file or directory -24> 2013-02-14 13:30:10.210909 7f5b457d6700 -1 filestore(/ceph/osd.22/) could not find 37c6171d/rbd_data.1949406b8b4567.000000000000117a/512//4 in index: (2) No such file or directory -23> 2013-02-14 13:30:10.218927 7f5b467d8700 -1 filestore(/ceph/osd.22/) could not find c4800d48/rbd_data.19bd426b8b4567.0000000000000000/82f//4 in index: (2) No such file or directory -22> 2013-02-14 13:30:10.219095 7f5b467d8700 -1 filestore(/ceph/osd.22/) could not find c4800d48/rbd_data.19bd426b8b4567.0000000000000000/7f9//4 in index: (2) No such file or directory -21> 2013-02-14 13:30:10.219230 7f5b467d8700 -1 filestore(/ceph/osd.22/) could not find c4800d48/rbd_data.19bd426b8b4567.0000000000000000/7c5//4 in index: (2) No such file or directory -20> 2013-02-14 13:30:10.219270 7f5b467d8700 -1 filestore(/ceph/osd.22/) could not find c4800d48/rbd_data.19bd426b8b4567.0000000000000000/78d//4 in index: (2) No such file or directory -19> 2013-02-14 13:30:10.219311 7f5b467d8700 -1 filestore(/ceph/osd.22/) could not find c4800d48/rbd_data.19bd426b8b4567.0000000000000000/759//4 in index: (2) No such file or directory -18> 2013-02-14 13:30:10.219353 7f5b467d8700 -1 filestore(/ceph/osd.22/) could not find c4800d48/rbd_data.19bd426b8b4567.0000000000000000/729//4 in index: (2) No such file or directory -17> 2013-02-14 13:30:10.219397 7f5b467d8700 -1 filestore(/ceph/osd.22/) could not find c4800d48/rbd_data.19bd426b8b4567.0000000000000000/6f9//4 in index: (2) No such file or directory -16> 2013-02-14 13:30:10.219433 7f5b467d8700 -1 filestore(/ceph/osd.22/) could not find c4800d48/rbd_data.19bd426b8b4567.0000000000000000/6c5//4 in index: (2) No such file or directory -15> 2013-02-14 13:30:10.219473 7f5b467d8700 -1 filestore(/ceph/osd.22/) could not find c4800d48/rbd_data.19bd426b8b4567.0000000000000000/691//4 in index: (2) No such file or directory -14> 2013-02-14 13:30:10.219527 7f5b467d8700 -1 filestore(/ceph/osd.22/) could not find c4800d48/rbd_data.19bd426b8b4567.0000000000000000/65d//4 in index: (2) No such file or directory -13> 2013-02-14 13:30:10.219582 7f5b467d8700 -1 filestore(/ceph/osd.22/) could not find c4800d48/rbd_data.19bd426b8b4567.0000000000000000/629//4 in index: (2) No such file or directory -12> 2013-02-14 13:30:10.219643 7f5b467d8700 -1 filestore(/ceph/osd.22/) could not find c4800d48/rbd_data.19bd426b8b4567.0000000000000000/5f9//4 in index: (2) No such file or directory -11> 2013-02-14 13:30:10.219689 7f5b467d8700 -1 filestore(/ceph/osd.22/) could not find c4800d48/rbd_data.19bd426b8b4567.0000000000000000/5c5//4 in index: (2) No such file or directory -10> 2013-02-14 13:30:10.219740 7f5b467d8700 -1 filestore(/ceph/osd.22/) could not find c4800d48/rbd_data.19bd426b8b4567.0000000000000000/58f//4 in index: (2) No such file or directory -9> 2013-02-14 13:30:10.219910 7f5b467d8700 -1 filestore(/ceph/osd.22/) could not find c4800d48/rbd_data.19bd426b8b4567.0000000000000000/561//4 in index: (2) No such file or directory -8> 2013-02-14 13:30:10.219965 7f5b467d8700 -1 filestore(/ceph/osd.22/) could not find c4800d48/rbd_data.19bd426b8b4567.0000000000000000/533//4 in index: (2) No such file or directory -7> 2013-02-14 13:30:10.219995 7f5b467d8700 -1 filestore(/ceph/osd.22/) could not find 4359ad48/rbd_data.1949406b8b4567.000000000000094c/512//4 in index: (2) No such file or directory -6> 2013-02-14 13:30:10.220027 7f5b467d8700 -1 filestore(/ceph/osd.22/) could not find c4800d48/rbd_data.19bd426b8b4567.0000000000000000/501//4 in index: (2) No such file or directory -5> 2013-02-14 13:30:10.220069 7f5b467d8700 -1 filestore(/ceph/osd.22/) could not find c4800d48/rbd_data.19bd426b8b4567.0000000000000000/4d2//4 in index: (2) No such file or directory -4> 2013-02-14 13:30:10.220108 7f5b467d8700 -1 filestore(/ceph/osd.22/) could not find c4800d48/rbd_data.19bd426b8b4567.0000000000000000/4a5//4 in index: (2) No such file or directory -3> 2013-02-14 13:30:10.568978 7f5b43fd3700 -1 filestore(/ceph/osd.22/) could not find 51b6e635/rbd_data.19b4f56b8b4567.00000000000014d9/815//4 in index: (2) No such file or directory -2> 2013-02-14 13:30:12.526121 7f5b45fd7700 -1 filestore(/ceph/osd.22/) could not find f9f978e1/rbd_data.19f7db6b8b4567.0000000000001e92/839//4 in index: (2) No such file or directory -1> 2013-02-14 13:30:12.526615 7f5b45fd7700 -1 filestore(/ceph/osd.22/) could not find f9f978e1/rbd_data.19f7db6b8b4567.0000000000001e92/803//4 in index: (2) No such file or directory 0> 2013-02-14 13:30:35.779344 7f5b4afe1700 -1 *** Caught signal (Segmentation fault) ** in thread 7f5b4afe1700 ceph version 0.56.3-4-g5d79bef (5d79bef0715fe085be26b69f0a323ef19c75b830) 1: /usr/bin/ceph-osd() [0x7a9e59] 2: (()+0xeff0) [0x7f5b58679ff0] 3: (tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)+0xe3) [0x7f5b57758b23] 4: (tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned long)+0x23) [0x7f5b57758ba3] 5: (operator delete(void*)+0x280) [0x7f5b57764be0] 6: (__gnu_cxx::hashtable<std::pair<hobject_t const, pg_log_entry_t*>, hobject_t, __gnu_cxx::hash<hobject_t>, std::_Select1st<std::pair<hobject_t const, pg_log_entry_t*> >, std::equal_to<hobject_t>, std::allocator<pg_log_entry_t*> >::clear()+0x57) [0x6d77c7] 7: (PG::~PG()+0x3ae) [0x6cd09e] 8: (ReplicatedPG::~ReplicatedPG()+0x1b6) [0x5e67d6] 9: (OSD::handle_pg_remove(std::tr1::shared_ptr<OpRequest>)+0xa24) [0x6398a4] 10: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0x14b) [0x63b03b] 11: (OSD::_dispatch(Message*)+0x26b) [0x63bfbb] 12: (OSD::ms_dispatch(Message*)+0x1ea) [0x63c6ea] 13: (DispatchQueue::entry()+0x2e1) [0x8c33a1] 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x7c8ded] 15: (()+0x68ca) [0x7f5b586718ca] 16: (clone()+0x6d) [0x7f5b56ae0b6d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 0 lockdep 0/ 0 context 0/ 0 crush 0/ 0 mds 0/ 0 mds_balancer 0/ 0 mds_locker 0/ 0 mds_log 0/ 0 mds_log_expire 0/ 0 mds_migrator 0/ 0 buffer 0/ 0 timer 0/ 0 filer 0/ 1 striper 0/ 0 objecter 0/ 0 rados 0/ 0 rbd 0/ 0 journaler 0/ 0 objectcacher 0/ 0 client 0/ 0 osd 0/ 0 optracker 0/ 0 objclass 0/ 0 filestore 0/ 0 journal 0/ 0 ms 0/ 0 mon 0/ 0 monc 0/ 0 paxos 0/ 0 tp 0/ 0 auth 1/ 5 crypto 0/ 0 finisher 0/ 0 heartbeatmap 0/ 0 perfcounter 0/ 0 rgw 0/ 0 hadoop 1/ 5 javaclient 0/ 0 asok 0/ 0 throttle -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 100000 max_new 1000 log_file /var/log/ceph/ceph-osd.22.log --- end dump of recent events --- Stefan Am 14.02.2013 03:31, schrieb Sage Weil: > We've fixed an important bug that a few users were hitting with > unresponsive OSDs and internal heartbeat timeouts. This, along with a > range of less critical fixes, was sufficient to justify another point > release. Any production users should upgrade. > > Notable changes include: > > * osd: flush peering work queue prior to start > * osd: persist osdmap epoch for idle PGs > * osd: fix and simplify connection handling for heartbeats > * osd: avoid crash on invalid admin command > * mon: fix rare races with monitor elections and commands > * mon: enforce that OSD reweights be between 0 and 1 (NOTE: not CRUSH > weights) > * mon: approximate client, recovery bandwidth logging > * radosgw: fixed some XML formatting to conform to Swift API inconsistency > * radosgw: fix usage accounting bug; add repair tool > * radosgw: make fallback URI configurable (necessary on some web servers) > * librbd: fix handling for interrupted 'unprotect' operations > * mds, ceph-fuse: allow file and directory layouts to be modified via > virtual xattrs > > You can get v0.56.3 from the usual locations: > > * Git at git://github.com/ceph/ceph.git > * Tarball at http://ceph.com/download/ceph-0.56.3.tar.gz > * For Debian/Ubuntu packages, see http://ceph.com/docs/master/install/debian > * For RPMs, see http://ceph.com/docs/master/install/rpm > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com