On 6/19/24 10:30, Xiubo Li wrote:
On 6/19/24 16:13, Dietmar Rieder wrote:Hi Xiubo,[...]0> 2024-06-19T07:12:39.236+0000 7f90fa912700 -1 *** Caught signal (Aborted) **in thread 7f90fa912700 thread_name:md_log_replayceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)1: /lib64/libpthread.so.0(+0x12d20) [0x7f910b4d2d20] 2: gsignal() 3: abort()4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x18f) [0x7f910c722e6f]5: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f910c722fdb]6: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t, std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x55a93c0de9a5] 7: (EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*)+0x4207) [0x55a93c3e76e7]8: (EUpdate::replay(MDSRank*)+0x61) [0x55a93c3e9f81] 9: (MDLog::_replay_thread()+0x6c9) [0x55a93c3701d9] 10: (MDLog::ReplayThread::entry()+0x11) [0x55a93c01e2d1] 11: /lib64/libpthread.so.0(+0x81ca) [0x7f910b4c81ca] 12: clone()NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.This is a known bug, please see https://tracker.ceph.com/issues/61009.As a workaround I am afraid you need to trim the journal logs first and then try to restart the MDS daemons, And at the same time please follow the workaround in https://tracker.ceph.com/issues/61009#note-26I see, I'll try to do this. Are there any caveats or issues to expect by trimming the journal logs?Certainly you will lose the dirty metadata in the journals.Is there a step by step guide on how to perform the trimming? Should all MDS be stopped before?Please follow https://docs.ceph.com/en/nautilus/cephfs/disaster-recovery-experts/#disaster-recovery-experts.
OK, when I run the cephfs-journal-tool I get an error: # cephfs-journal-tool journal export backup.bin Error ((22) Invalid argument)My cluster is managed by caphadm, so (in my stress situation) I'm not able find the correct way to use cephfs-journal-tool
I'm sure it is something stupid that I'm missing but I'd be happy for any hint.
Thanks Dietmar
Sorry for the lot of (naive) questions, but I do not want to make any mistake here.Since the journal logs were corrupted and couldn't be replayed by the MDS when starting and the MDS crash will continue unless you manually repair or truncate it.Thanks - XiuboThanks for your support, Dietmar--- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 rbd_pwl 0/ 5 journaler 0/ 5 objectcacher 0/ 5 immutable_obj_cache 0/ 5 client 1/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 0 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 1 reserver 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 rgw_sync 1/ 5 rgw_datacache 1/ 5 rgw_access 1/ 5 rgw_dbstore 1/ 5 rgw_flight 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 compressor 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 1/ 5 fuse 2/ 5 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace 1/ 5 prioritycache 0/ 5 test 0/ 5 cephfs_mirror 0/ 5 cephsqlite 0/ 5 seastore 0/ 5 seastore_onode 0/ 5 seastore_odata 0/ 5 seastore_omap 0/ 5 seastore_tm 0/ 5 seastore_t 0/ 5 seastore_cleaner 0/ 5 seastore_epm 0/ 5 seastore_lba 0/ 5 seastore_fixedkv_tree 0/ 5 seastore_cache 0/ 5 seastore_journal 0/ 5 seastore_device 0/ 5 seastore_backref 0/ 5 alienstore 1/ 5 mclock 0/ 5 cyanstore 1/ 5 ceph_exporter 1/ 5 memstore -2/-2 (syslog threshold) -1/-1 (stderr threshold) --- pthread ID / name mapping for recent threads --- 7f90fa912700 / md_log_replay 7f90fb914700 / 7f90fc115700 / MR_Finisher 7f90fd117700 / PQ_Finisher 7f90fe119700 / ms_dispatch 7f910011d700 / ceph-mds 7f9102121700 / ms_dispatch 7f9103123700 / io_context_pool 7f9104125700 / admin_socket 7f9104926700 / msgr-worker-2 7f9105127700 / msgr-worker-1 7f9105928700 / msgr-worker-0 7f910d8eab00 / ceph-mds max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-mds.default.cephmon-02.duujba.log --- end dump of recent events --- I have no idea how to resolve this and would be grateful for any help. Dietmar _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx
Attachment:
OpenPGP_signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx