Does someone know which commit this is? I don't want to switch to the unstable branch at the moment. Christian -----Ursprüngliche Nachricht----- Von: Smets, Jan (Jan) [mailto:jan.smets@xxxxxxxxxxxxxxxxxx] Gesendet: Dienstag, 26. Oktober 2010 14:37 An: Christian Brunner Betreff: RE: osd/ReplicatedPG.cc:2403: FAILED assert(!missing.is_missing(soid)) Hi I think I've seen this one before, and I think it's fixed in the unstable branch. Can you try that one? - Jan -----Original Message----- From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Christian Brunner Sent: dinsdag 26 oktober 2010 14:15 To: ceph-devel@xxxxxxxxxxxxxxx Subject: osd/ReplicatedPG.cc:2403: FAILED assert(!missing.is_missing(soid)) Here is another problem (I think it's unrelated to the previous, but I'm not sure). One of our osds crashed with the following message: osd/ReplicatedPG.cc: In function 'void ReplicatedPG::sub_op_modify(MOSDSubOp*)': osd/ReplicatedPG.cc:2403: FAILED assert(!missing.is_missing(soid)) ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279) 1: (OSD::dequeue_op(PG*)+0x374) [0x4d27a4] 2: (ThreadPool::worker()+0x291) [0x5b69d1] 3: (ThreadPool::WorkThread::entry()+0xd) [0x4f36dd] 4: (Thread::_entry_func(void*)+0x7) [0x470927] 5: (()+0x77e1) [0x7fe83e60e7e1] 6: (clone()+0x6d) [0x7fe83d83251d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. *** Caught signal (ABRT) *** ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279) 1: (sigabrt_handler(int)+0x7d) [0x5c767d] 2: (()+0x32a30) [0x7fe83d783a30] 3: (gsignal()+0x35) [0x7fe83d7839b5] 4: (abort()+0x175) [0x7fe83d785195] 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fe83e028aad] 6: (()+0xbcc36) [0x7fe83e026c36] 7: (()+0xbcc63) [0x7fe83e026c63] 8: (()+0xbcd5e) [0x7fe83e026d5e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x448) [0x5b5338] 10: (ReplicatedPG::sub_op_modify(MOSDSubOp*)+0x719) [0x48d8b9] 11: (OSD::dequeue_op(PG*)+0x374) [0x4d27a4] 12: (ThreadPool::worker()+0x291) [0x5b69d1] 13: (ThreadPool::WorkThread::entry()+0xd) [0x4f36dd] 14: (Thread::_entry_func(void*)+0x7) [0x470927] 15: (()+0x77e1) [0x7fe83e60e7e1] 16: (clone()+0x6d) [0x7fe83d83251d] When we are trying to restart the cosd, we can see the following log-output: 2010-10-26 13:51:52.176904 7f005e83e720 journal read_entry 10102640640: seq 188804 728 bytes 2010-10-26 13:51:52.308172 7f0054f31710 osd5 39 map says i am down or have a different address. switching to boot state. 2010-10-26 13:51:52.308222 7f0054f31710 log [WRN] : map e39 wrongly marked me down 2010-10-26 13:51:56.097515 7f0054f31710 journal throttle: waited for ops After that we are unable to do any rados operations in our cluster. The only way to solve this, was killing the cosd again. Regards, Christian -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html