AW: osd/ReplicatedPG.cc:2403: FAILED assert(!missing.is_missing(soid))

Christian Brunner <christian@xxxxxxxxxxxxxx> · Tue, 26 Oct 2010 21:10:33 +0200

Does someone know which commit this is? I don't want to switch to the unstable branch at the moment.

Christian

-----Ursprüngliche Nachricht-----
Von: Smets, Jan (Jan) [mailto:jan.smets@xxxxxxxxxxxxxxxxxx] 
Gesendet: Dienstag, 26. Oktober 2010 14:37
An: Christian Brunner
Betreff: RE: osd/ReplicatedPG.cc:2403: FAILED assert(!missing.is_missing(soid))

Hi

I think I've seen this one before, and I think it's fixed in the unstable branch.

Can you try that one?

- Jan

-----Original Message-----
From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Christian Brunner
Sent: dinsdag 26 oktober 2010 14:15
To: ceph-devel@xxxxxxxxxxxxxxx
Subject: osd/ReplicatedPG.cc:2403: FAILED assert(!missing.is_missing(soid))

Here is another problem (I think it's unrelated to the previous, but I'm not sure).

One of our osds crashed with the following message:

osd/ReplicatedPG.cc: In function 'void ReplicatedPG::sub_op_modify(MOSDSubOp*)':
osd/ReplicatedPG.cc:2403: FAILED assert(!missing.is_missing(soid)) ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279)
1: (OSD::dequeue_op(PG*)+0x374) [0x4d27a4]
2: (ThreadPool::worker()+0x291) [0x5b69d1]
3: (ThreadPool::WorkThread::entry()+0xd) [0x4f36dd]
4: (Thread::_entry_func(void*)+0x7) [0x470927]
5: (()+0x77e1) [0x7fe83e60e7e1]
6: (clone()+0x6d) [0x7fe83d83251d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
*** Caught signal (ABRT) ***
ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279)
1: (sigabrt_handler(int)+0x7d) [0x5c767d]
2: (()+0x32a30) [0x7fe83d783a30]
3: (gsignal()+0x35) [0x7fe83d7839b5]
4: (abort()+0x175) [0x7fe83d785195]
5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fe83e028aad]
6: (()+0xbcc36) [0x7fe83e026c36]
7: (()+0xbcc63) [0x7fe83e026c63]
8: (()+0xbcd5e) [0x7fe83e026d5e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x448) [0x5b5338]
10: (ReplicatedPG::sub_op_modify(MOSDSubOp*)+0x719) [0x48d8b9]
11: (OSD::dequeue_op(PG*)+0x374) [0x4d27a4]
12: (ThreadPool::worker()+0x291) [0x5b69d1]
13: (ThreadPool::WorkThread::entry()+0xd) [0x4f36dd]
14: (Thread::_entry_func(void*)+0x7) [0x470927]
15: (()+0x77e1) [0x7fe83e60e7e1]
16: (clone()+0x6d) [0x7fe83d83251d]

When we are trying to restart the cosd, we can see the following log-output:

2010-10-26 13:51:52.176904 7f005e83e720 journal read_entry
10102640640: seq 188804 728 bytes
2010-10-26 13:51:52.308172 7f0054f31710 osd5 39 map says i am down or have a different address.  switching to boot state.
2010-10-26 13:51:52.308222 7f0054f31710 log [WRN] : map e39 wrongly marked me down
2010-10-26 13:51:56.097515 7f0054f31710 journal throttle: waited for ops

After that we are unable to do any rados operations in our cluster.
The only way to solve this, was killing the cosd again.

Regards,
Christian
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html