Re: MDS stuck in "up:replay"

Thomas Widhalm <widhalmt@xxxxxxxxxxxxx> · Thu, 23 Feb 2023 12:53:20 +0100

Yes, it's still:

 ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy 
(stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x135) [0x7f6bf079e43f]
 2: /usr/lib64/ceph/libceph-common.so.2(+0x269605) [0x7f6bf079e605]
 3: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t, 
std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x56474700ece5]
 4: (EMetaBlob::replay(MDSRank*, LogSegment*, MDPeerUpdate*)+0x59cd) 
[0x56474731140d]
 5: (EUpdate::replay(MDSRank*)+0x40) [0x5647473125a0]
 6: (MDLog::_replay_thread()+0x9b3) [0x564747298443]
 7: (MDLog::ReplayThread::entry()+0x11) [0x564746f54e31]
 8: /lib64/libpthread.so.0(+0x81ca) [0x7f6bef78e1ca]
 9: clone()

     0> 2023-02-22T17:07:28.647+0000 7f6be0358700 -1 *** Caught signal 
(Aborted) **
 in thread 7f6be0358700 thread_name:md_log_replay

 ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy 
(stable)
 1: /lib64/libpthread.so.0(+0x12cf0) [0x7f6bef798cf0]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x18f) [0x7f6bf079e499]
 5: /usr/lib64/ceph/libceph-common.so.2(+0x269605) [0x7f6bf079e605]
 6: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t, 
std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x56474700ece5]
 7: (EMetaBlob::replay(MDSRank*, LogSegment*, MDPeerUpdate*)+0x59cd) 
[0x56474731140d]
 8: (EUpdate::replay(MDSRank*)+0x40) [0x5647473125a0]
 9: (MDLog::_replay_thread()+0x9b3) [0x564747298443]
 10: (MDLog::ReplayThread::entry()+0x11) [0x564746f54e31]
 11: /lib64/libpthread.so.0(+0x81ca) [0x7f6bef78e1ca]
 12: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

If you need more, just let me know, please.

On 23.02.23 01:34, Xiubo Li wrote:

On 23/02/2023 05:56, Thomas Widhalm wrote:
Ah, sorry. My bad.

The MDS crashed and I restarted them. And I'm waiting for them to 
crash again.

There's a tracker for this or a related issue: 
https://tracker.ceph.com/issues/58489

Is the call trace the same with this tracker ?

Thanks,

Is there any place I can upload you anything from the logs? I'm still 
a bit new to Ceph but I guess, you'd like to have the crash logs?

Thank you in advance. Any help is really appreciated. My filesystems 
are still completely down.

Cheers,
Thomas

On 22.02.23 18:36, Patrick Donnelly wrote:
On Wed, Feb 22, 2023 at 12:10 PM Thomas Widhalm 
<widhalmt@xxxxxxxxxxxxx> wrote:

Hi,

Thanks for the idea!

I tried it immediately but still, MDS are in up:replay mode. So far 
they
haven't crashed but this usually takes a few minutes.

So no effect so far. :-(

The commands I gave were for producing hopefully useful debug logs.
Not intended to fix the problem for you.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

Attachment:
OpenPGP_signature

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx