Re: [EXTERN] Re: Urgent help with degraded filesystem needed

Dietmar Rieder <dietmar.rieder@xxxxxxxxxxx> · Wed, 19 Jun 2024 13:42:20 +0200

On 6/19/24 11:15, Dietmar Rieder wrote:
On 6/19/24 10:30, Xiubo Li wrote:

On 6/19/24 16:13, Dietmar Rieder wrote:
Hi Xiubo,

[...]

     0> 2024-06-19T07:12:39.236+0000 7f90fa912700 -1 *** Caught 
signal (Aborted) **
 in thread 7f90fa912700 thread_name:md_log_replay

 ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) 
reef (stable)
 1: /lib64/libpthread.so.0(+0x12d20) [0x7f910b4d2d20]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x18f) [0x7f910c722e6f]
 5: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f910c722fdb]
 6: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t, 
std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x55a93c0de9a5]
 7: (EMetaBlob::replay(MDSRank*, LogSegment*, int, 
MDPeerUpdate*)+0x4207) [0x55a93c3e76e7]
 8: (EUpdate::replay(MDSRank*)+0x61) [0x55a93c3e9f81]
 9: (MDLog::_replay_thread()+0x6c9) [0x55a93c3701d9]
 10: (MDLog::ReplayThread::entry()+0x11) [0x55a93c01e2d1]
 11: /lib64/libpthread.so.0(+0x81ca) [0x7f910b4c81ca]
 12: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

This is a known bug, please see https://tracker.ceph.com/issues/61009.

As a workaround I am afraid you need to trim the journal logs first 
and then try to restart the MDS daemons, And at the same time please 
follow the workaround in https://tracker.ceph.com/issues/61009#note-26

I see, I'll try to do this. Are there any caveats or issues to expect 
by trimming the journal logs?

Certainly you will lose the dirty metadata in the journals.

Is there a step by step guide on how to perform the trimming? Should 
all MDS be stopped before?

Please follow 
https://docs.ceph.com/en/nautilus/cephfs/disaster-recovery-experts/#disaster-recovery-experts.

OK, when I run the cephfs-journal-tool I get an error:

# cephfs-journal-tool journal export backup.bin
Error ((22) Invalid argument)

My cluster is managed by caphadm, so (in my stress situation) I'm not 
able find the correct way to use cephfs-journal-tool

I'm sure it is something stupid that I'm missing but I'd be happy for 
any hint.

I ran the disaster recovery procedures now, as follows:

[root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:0 event 
recover_dentries summary
Events by type:
  OPEN: 8737
  PURGED: 1
  SESSION: 9
  SESSIONS: 2
  SUBTREEMAP: 128
  TABLECLIENT: 2
  TABLESERVER: 30
  UPDATE: 9207
Errors: 0
[root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:1 event 
recover_dentries summary
Events by type:
  OPEN: 3
  SESSION: 1
  SUBTREEMAP: 34
  UPDATE: 32965
Errors: 0
[root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:2 event 
recover_dentries summary
Events by type:
  OPEN: 5289
  SESSION: 10
  SESSIONS: 3
  SUBTREEMAP: 128
  UPDATE: 76448
Errors: 0

[root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:all  journal inspect
Overall journal integrity: OK
Overall journal integrity: DAMAGED
Corrupt regions:
  0xd9a84f243c-ffffffffffffffff
Overall journal integrity: OK
[root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:0  journal inspect
Overall journal integrity: OK
[root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:1  journal inspect
Overall journal integrity: DAMAGED
Corrupt regions:
  0xd9a84f243c-ffffffffffffffff
[root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:2  journal inspect
Overall journal integrity: OK

[root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:0 journal reset
old journal was 879331755046~508520587
new journal start will be 879843344384 (3068751 bytes past old end)
writing journal head
writing EResetJournal entry
done
[root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:1 journal reset
old journal was 934711229813~120432327
new journal start will be 934834864128 (3201988 bytes past old end)
writing journal head
writing EResetJournal entry
done
[root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:2 journal reset
old journal was 1334153584288~252692691
new journal start will be 1334409428992 (3152013 bytes past old end)
writing journal head
writing EResetJournal entry
done

[root@ceph01-b /]# cephfs-table-tool all reset session
{
    "0": {
        "data": {},
        "result": 0
    },
    "1": {
        "data": {},
        "result": 0
    },
    "2": {
        "data": {},
        "result": 0
    }
}

[root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:1  journal inspect
Overall journal integrity: OK

[root@ceph01-b /]# ceph fs reset cephfs --yes-i-really-mean-it

But now I hit the error below:

   -20> 2024-06-19T11:13:00.610+0000 7ff3694d0700 10 monclient: 
_send_mon_message to mon.cephmon-03 at v2:10.1.3.23:3300/0
   -19> 2024-06-19T11:13:00.637+0000 7ff3664ca700  2 mds.0.cache Memory 
usage:  total 485928, rss 170860, heap 207156, baseline 182580, 0 / 
33434 inodes have caps, 0 caps, 0 caps per inode
   -18> 2024-06-19T11:13:00.787+0000 7ff36a4d2700  1 
mds.default.cephmon-03.chjusj Updating MDS map to version 8061 from mon.1
   -17> 2024-06-19T11:13:00.787+0000 7ff36a4d2700  1 mds.0.8058 
handle_mds_map i am now mds.0.8058
   -16> 2024-06-19T11:13:00.787+0000 7ff36a4d2700  1 mds.0.8058 
handle_mds_map state change up:rejoin --> up:active
   -15> 2024-06-19T11:13:00.787+0000 7ff36a4d2700  1 mds.0.8058 
recovery_done -- successful recovery!
   -14> 2024-06-19T11:13:00.788+0000 7ff36a4d2700  1 mds.0.8058 
active_start
   -13> 2024-06-19T11:13:00.789+0000 7ff36dcd9700  5 
mds.beacon.default.cephmon-03.chjusj received beacon reply up:active seq 
4 rtt 0.955007
   -12> 2024-06-19T11:13:00.790+0000 7ff36a4d2700  1 mds.0.8058 cluster 
recovered.
   -11> 2024-06-19T11:13:00.790+0000 7ff36a4d2700  4 mds.0.8058 
set_osd_epoch_barrier: epoch=33596
   -10> 2024-06-19T11:13:00.790+0000 7ff3634c4700  5 mds.0.log 
_submit_thread 879843344432~2609 : EUpdate check_inode_max_size 
[metablob 0x100, 2 dirs]
    -9> 2024-06-19T11:13:00.791+0000 7ff3644c6700 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/mds/MDCache.cc: 
In function 'void MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, 
CDentry*, snapid_t, CInode**, CDentry::linkage_t*)' thread 7ff3644c6700 
time 2024-06-19T11:13:00.791580+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/mds/MDCache.cc: 
1660: FAILED ceph_assert(follows >= realm->get_newest_seq())

 ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef 
(stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x135) [0x7ff374ad3e15]
 2: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7ff374ad3fdb]
 3: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, 
snapid_t, CInode**, CDentry::linkage_t*)+0x13c7) [0x55da0a7aa227]
 4: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*, 
snapid_t)+0xc5) [0x55da0a7aa3a5]
 5: (Locker::check_inode_max_size(CInode*, bool, unsigned long, 
unsigned long, utime_t)+0x84d) [0x55da0a88ce3d]
 6: (RecoveryQueue::_recovered(CInode*, int, unsigned long, 
utime_t)+0x4f0) [0x55da0a85ad50]
 7: (MDSContext::complete(int)+0x5f) [0x55da0a9ddeef]
 8: (MDSIOContextBase::complete(int)+0x524) [0x55da0a9de674]
 9: (Filer::C_Probe::finish(int)+0xbb) [0x55da0aa9dc9b]
 10: (Context::complete(int)+0xd) [0x55da0a6775fd]
 11: (Finisher::finisher_thread_entry()+0x18d) [0x7ff374b77abd]
 12: /lib64/libpthread.so.0(+0x81ca) [0x7ff3738791ca]
 13: clone()

    -8> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client 
handle_log_ack log(last 7) v1
    -7> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client  logged 
2024-06-19T11:12:59.647346+0000 mds.default.cephmon-03.chjusj (mds.0) 1 
: cluster [ERR] loaded dup inode 0x10003e45d99 [415,head] v61632 at 
/home/balaz/.bash_history-54696.tmp, but inode 0x10003e45d99.head v61639 
already exists at /home/balaz/.bash_history
    -6> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client  logged 
2024-06-19T11:12:59.648139+0000 mds.default.cephmon-03.chjusj (mds.0) 2 
: cluster [ERR] loaded dup inode 0x10003e45d7c [415,head] v253612 at 
/home/rieder/.bash_history-10215.tmp, but inode 0x10003e45d7c.head 
v253630 already exists at /home/rieder/.bash_history
    -5> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client  logged 
2024-06-19T11:12:59.649483+0000 mds.default.cephmon-03.chjusj (mds.0) 3 
: cluster [ERR] loaded dup inode 0x10003e45d83 [415,head] v164103 at 
/home/gottschling/.bash_history-44802.tmp, but inode 0x10003e45d83.head 
v164112 already exists at /home/gottschling/.bash_history
    -4> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client  logged 
2024-06-19T11:12:59.656221+0000 mds.default.cephmon-03.chjusj (mds.0) 4 
: cluster [ERR] bad backtrace on directory inode 0x10003e42340
    -3> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client  logged 
2024-06-19T11:12:59.737282+0000 mds.default.cephmon-03.chjusj (mds.0) 5 
: cluster [ERR] bad backtrace on directory inode 0x10003e45d8b
    -2> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client  logged 
2024-06-19T11:12:59.804984+0000 mds.default.cephmon-03.chjusj (mds.0) 6 
: cluster [ERR] bad backtrace on directory inode 0x10003e45d9f
    -1> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client  logged 
2024-06-19T11:12:59.805078+0000 mds.default.cephmon-03.chjusj (mds.0) 7 
: cluster [ERR] bad backtrace on directory inode 0x10003e45d90
     0> 2024-06-19T11:13:00.792+0000 7ff3644c6700 -1 *** Caught signal 
(Aborted) **
 in thread 7ff3644c6700 thread_name:MR_Finisher

 ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef 
(stable)
 1: /lib64/libpthread.so.0(+0x12d20) [0x7ff373883d20]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x18f) [0x7ff374ad3e6f]
 5: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7ff374ad3fdb]
 6: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, 
snapid_t, CInode**, CDentry::linkage_t*)+0x13c7) [0x55da0a7aa227]
 7: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*, 
snapid_t)+0xc5) [0x55da0a7aa3a5]
 8: (Locker::check_inode_max_size(CInode*, bool, unsigned long, 
unsigned long, utime_t)+0x84d) [0x55da0a88ce3d]
 9: (RecoveryQueue::_recovered(CInode*, int, unsigned long, 
utime_t)+0x4f0) [0x55da0a85ad50]
 10: (MDSContext::complete(int)+0x5f) [0x55da0a9ddeef]
 11: (MDSIOContextBase::complete(int)+0x524) [0x55da0a9de674]
 12: (Filer::C_Probe::finish(int)+0xbb) [0x55da0aa9dc9b]
 13: (Context::complete(int)+0xd) [0x55da0a6775fd]
 14: (Finisher::finisher_thread_entry()+0x18d) [0x7ff374b77abd]
 15: /lib64/libpthread.so.0(+0x81ca) [0x7ff3738791ca]
 16: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 rbd_pwl
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 immutable_obj_cache
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 0 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 rgw_sync
   1/ 5 rgw_datacache
   1/ 5 rgw_access
   1/ 5 rgw_dbstore
   1/ 5 rgw_flight
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   1/ 5 fuse
   2/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
   1/ 5 prioritycache
   0/ 5 test
   0/ 5 cephfs_mirror
   0/ 5 cephsqlite
   0/ 5 seastore
   0/ 5 seastore_onode
   0/ 5 seastore_odata
   0/ 5 seastore_omap
   0/ 5 seastore_tm
   0/ 5 seastore_t
   0/ 5 seastore_cleaner
   0/ 5 seastore_epm
   0/ 5 seastore_lba
   0/ 5 seastore_fixedkv_tree
   0/ 5 seastore_cache
   0/ 5 seastore_journal
   0/ 5 seastore_device
   0/ 5 seastore_backref
   0/ 5 alienstore
   1/ 5 mclock
   0/ 5 cyanstore
   1/ 5 ceph_exporter
   1/ 5 memstore
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
--- pthread ID / name mapping for recent threads ---
  7ff362cc3700 /
  7ff3634c4700 / md_submit
  7ff363cc5700 /
  7ff3644c6700 / MR_Finisher
  7ff3654c8700 / PQ_Finisher
  7ff365cc9700 / mds_rank_progr
  7ff3664ca700 / ms_dispatch
  7ff3684ce700 / ceph-mds
  7ff3694d0700 / safe_timer
  7ff36a4d2700 / ms_dispatch
  7ff36b4d4700 / io_context_pool
  7ff36c4d6700 / admin_socket
  7ff36ccd7700 / msgr-worker-2
  7ff36d4d8700 / msgr-worker-1
  7ff36dcd9700 / msgr-worker-0
  7ff375c9bb00 / ceph-mds
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mds.default.cephmon-03.chjusj.log
--- end dump of recent events ---

Any idea?

Thanks

Dietmar

> [...]
Attachment:
OpenPGP_signature.asc

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx