Re: [EXTERN] Re: Urgent help with degraded filesystem needed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 6/19/24 16:13, Dietmar Rieder wrote:
Hi Xiubo,

[...]

     0> 2024-06-19T07:12:39.236+0000 7f90fa912700 -1 *** Caught signal (Aborted) **
 in thread 7f90fa912700 thread_name:md_log_replay

 ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
 1: /lib64/libpthread.so.0(+0x12d20) [0x7f910b4d2d20]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x18f) [0x7f910c722e6f]
 5: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f910c722fdb]
 6: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t, std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x55a93c0de9a5]  7: (EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*)+0x4207) [0x55a93c3e76e7]
 8: (EUpdate::replay(MDSRank*)+0x61) [0x55a93c3e9f81]
 9: (MDLog::_replay_thread()+0x6c9) [0x55a93c3701d9]
 10: (MDLog::ReplayThread::entry()+0x11) [0x55a93c01e2d1]
 11: /lib64/libpthread.so.0(+0x81ca) [0x7f910b4c81ca]
 12: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

This is a known bug, please see https://tracker.ceph.com/issues/61009.

As a workaround I am afraid you need to trim the journal logs first and then try to restart the MDS daemons, And at the same time please follow the workaround in https://tracker.ceph.com/issues/61009#note-26

I see, I'll try to do this. Are there any caveats or issues to expect by trimming the journal logs?

Certainly you will lose the dirty metadata in the journals.

Is there a step by step guide on how to perform the trimming? Should all MDS be stopped before?

Please follow https://docs.ceph.com/en/nautilus/cephfs/disaster-recovery-experts/#disaster-recovery-experts.


Sorry for the lot of (naive) questions, but I do not want to make any mistake here.

Since the journal logs were corrupted and couldn't be replayed by the MDS when starting and the MDS crash will continue unless you manually repair or truncate it.

Thanks

- Xiubo

Thanks for your support,

Dietmar

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 rbd_pwl
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 immutable_obj_cache
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 0 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 rgw_sync
   1/ 5 rgw_datacache
   1/ 5 rgw_access
   1/ 5 rgw_dbstore
   1/ 5 rgw_flight
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   1/ 5 fuse
   2/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
   1/ 5 prioritycache
   0/ 5 test
   0/ 5 cephfs_mirror
   0/ 5 cephsqlite
   0/ 5 seastore
   0/ 5 seastore_onode
   0/ 5 seastore_odata
   0/ 5 seastore_omap
   0/ 5 seastore_tm
   0/ 5 seastore_t
   0/ 5 seastore_cleaner
   0/ 5 seastore_epm
   0/ 5 seastore_lba
   0/ 5 seastore_fixedkv_tree
   0/ 5 seastore_cache
   0/ 5 seastore_journal
   0/ 5 seastore_device
   0/ 5 seastore_backref
   0/ 5 alienstore
   1/ 5 mclock
   0/ 5 cyanstore
   1/ 5 ceph_exporter
   1/ 5 memstore
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
--- pthread ID / name mapping for recent threads ---
  7f90fa912700 / md_log_replay
  7f90fb914700 /
  7f90fc115700 / MR_Finisher
  7f90fd117700 / PQ_Finisher
  7f90fe119700 / ms_dispatch
  7f910011d700 / ceph-mds
  7f9102121700 / ms_dispatch
  7f9103123700 / io_context_pool
  7f9104125700 / admin_socket
  7f9104926700 / msgr-worker-2
  7f9105127700 / msgr-worker-1
  7f9105928700 / msgr-worker-0
  7f910d8eab00 / ceph-mds
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mds.default.cephmon-02.duujba.log
--- end dump of recent events ---


I have no idea how to resolve this and would be grateful for any help.

Dietmar

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux