Re: [EXTERN] Re: Urgent help with degraded filesystem needed

Dietmar Rieder <dietmar.rieder@xxxxxxxxxxx> · Mon, 24 Jun 2024 23:19:59 +0200

(resending this, the original message seems that it didn't make it through between all the SPAM recently sent to the list, my apologies if it doubles at some point)

Hi List, 

we are still struggeling to get our cephfs back online again, this is an update to inform you what we did so far, and we kindly ask for any input on this to get an idea on how to proceed:

After resetting the journals Xiubo suggested (in a PM) to go on with the disaster recovery procedure:

cephfs-data-scan init skipped creating the inodes 0x0x1 and 0x0x100

[root@ceph01-b ~]# cephfs-data-scan init
Inode 0x0x1 already exists, skipping create.  Use --force-init to overwrite the existing object.
Inode 0x0x100 already exists, skipping create.  Use --force-init to overwrite the existing object.

We did not use --force-init and proceeded with scan_extents using a single worker, which was indeed very slow.

After ~24h we interupted the scan_extents and restarted it with 32 workers which went through in about 2h15min w/o any issue.

Then I started scan_inodes with 32 workers this was also finished after ~50min no output on stderr or stdout.

I went on with scan_links, which after ~45 minutes threw the following error:

# cephfs-data-scan scan_links
Error ((2) No such file or directory)

then "cephfs-data-scan cleanup" went through w/o any message and took about 9hrs 20min.

Unfortunately, when starting the MDS the cephfs seems still to be in damage. I get quite some "loaded already corrupt dentry:" messages and 2  "[ERR] : bad backtrace on directory inode"  errors:

(In the following log I removed almost all "loaded already corrupt dentry" entries, for clarity reasons)

2024-06-23T08:06:20.934+0000 7ff05728fb00  0 set uid:gid to 167:167 (ceph:ceph)
2024-06-23T08:06:20.934+0000 7ff05728fb00  0 ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable), process ceph-mds, pid 2
2024-06-23T08:06:20.934+0000 7ff05728fb00  1 main not setting numa affinity
2024-06-23T08:06:20.934+0000 7ff05728fb00  0 pidfile_write: ignore empty --pid-file
2024-06-23T08:06:20.936+0000 7ff04bac6700  1 mds.default.cephmon-01.cepqjp Updating MDS map to version 8062 from mon.0
2024-06-23T08:06:21.583+0000 7ff04bac6700  1 mds.default.cephmon-01.cepqjp Updating MDS map to version 8063 from mon.0
2024-06-23T08:06:21.583+0000 7ff04bac6700  1 mds.default.cephmon-01.cepqjp Monitors have assigned me to become a standby.
2024-06-23T08:06:21.604+0000 7ff04bac6700  1 mds.default.cephmon-01.cepqjp Updating MDS map to version 8064 from mon.0
2024-06-23T08:06:21.604+0000 7ff04bac6700  1 mds.0.8064 handle_mds_map i am now mds.0.8064
2024-06-23T08:06:21.604+0000 7ff04bac6700  1 mds.0.8064 handle_mds_map state change up:standby --> up:replay
2024-06-23T08:06:21.604+0000 7ff04bac6700  1 mds.0.8064 replay_start
2024-06-23T08:06:21.604+0000 7ff04bac6700  1 mds.0.8064  waiting for osdmap 34327 (which blocklists prior instance)
2024-06-23T08:06:21.627+0000 7ff0452b9700  0 mds.0.cache creating system inode with ino:0x100
2024-06-23T08:06:21.627+0000 7ff0452b9700  0 mds.0.cache creating system inode with ino:0x1
2024-06-23T08:06:21.636+0000 7ff0442b7700  1 mds.0.journal EResetJournal
2024-06-23T08:06:21.636+0000 7ff0442b7700  1 mds.0.sessionmap wipe start
2024-06-23T08:06:21.636+0000 7ff0442b7700  1 mds.0.sessionmap wipe result
2024-06-23T08:06:21.636+0000 7ff0442b7700  1 mds.0.sessionmap wipe done
2024-06-23T08:06:21.656+0000 7ff045aba700  1 mds.0.8064 Finished replaying journal
2024-06-23T08:06:21.656+0000 7ff045aba700  1 mds.0.8064 making mds journal writeable
2024-06-23T08:06:22.604+0000 7ff04bac6700  1 mds.default.cephmon-01.cepqjp Updating MDS map to version 8065 from mon.0
2024-06-23T08:06:22.604+0000 7ff04bac6700  1 mds.0.8064 handle_mds_map i am now mds.0.8064
2024-06-23T08:06:22.604+0000 7ff04bac6700  1 mds.0.8064 handle_mds_map state change up:replay --> up:reconnect
2024-06-23T08:06:22.604+0000 7ff04bac6700  1 mds.0.8064 reconnect_start
2024-06-23T08:06:22.604+0000 7ff04bac6700  1 mds.0.8064 reopen_log
2024-06-23T08:06:22.605+0000 7ff04bac6700  1 mds.0.8064 reconnect_done
2024-06-23T08:06:23.605+0000 7ff04bac6700  1 mds.default.cephmon-01.cepqjp Updating MDS map to version 8066 from mon.0
2024-06-23T08:06:23.605+0000 7ff04bac6700  1 mds.0.8064 handle_mds_map i am now mds.0.8064
2024-06-23T08:06:23.605+0000 7ff04bac6700  1 mds.0.8064 handle_mds_map state change up:reconnect --> up:rejoin
2024-06-23T08:06:23.605+0000 7ff04bac6700  1 mds.0.8064 rejoin_start
2024-06-23T08:06:23.609+0000 7ff04bac6700  1 mds.0.8064 rejoin_joint_start
2024-06-23T08:06:23.611+0000 7ff045aba700  1 mds.0.cache.den(0x10000000000 groups) loaded already corrupt dentry: [dentry #0x1/data/groups [bf,head] rep@0.0 NULL (dversion lock) pv=0 v=
7910497 ino=(nil) state=0 0x55cf13e9f400]
2024-06-23T08:06:23.611+0000 7ff045aba700  1 mds.0.cache.den(0x1000192ec16.1* scad_prj) loaded already corrupt dentry: [dentry #0x1/home/scad_prj [159,head] rep@0.0 NULL (dversion lock)
 pv=0 v=2462060 ino=(nil) state=0 0x55cf14220a00]
[...]
2024-06-23T08:06:23.628+0000 7ff045aba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10003e42340
2024-06-23T08:06:23.668+0000 7ff045aba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10003e45d8b
[...]
2024-06-23T08:06:23.773+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp respawn!
--- begin dump of recent events ---
 -9999> 2024-06-23T08:06:23.615+0000 7ff045aba700  1 mds.0.cache.den(0x1000321976d jupyterhub_slurmspawner_67002.log) loaded already corrupt dentry: [dentry #0x1/home/michelotto/jupyterhub_slurmspawner_67002.log [239,36c] rep@0.0 NULL (dversion lock) pv=0 v=26855917 ino=(nil) state=0 0x55cf147ff680]
[...]
 -9878> 2024-06-23T08:06:23.616+0000 7ff04eacc700 10 monclient: get_auth_request con 0x55cf13f13c00 auth_method 0
[...]
 -9877> 2024-06-23T08:06:23.616+0000 7ff04e2cb700 10 monclient: get_auth_request con 0x55cf146d1800 auth_method 0
 -9458> 2024-06-23T08:06:23.620+0000 7ff04f2cd700 10 monclient: get_auth_request con 0x55cf13f13400 auth_method 0
 -9353> 2024-06-23T08:06:23.622+0000 7ff04eacc700 10 monclient: get_auth_request con 0x55cf146d0800 auth_method 0
 -8980> 2024-06-23T08:06:23.625+0000 7ff04e2cb700 10 monclient: get_auth_request con 0x55cf14d9f400 auth_method 0
 -8978> 2024-06-23T08:06:23.625+0000 7ff04f2cd700 10 monclient: get_auth_request con 0x55cf14d9fc00 auth_method 0
 -8849> 2024-06-23T08:06:23.628+0000 7ff045aba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10003e42340
 -8574> 2024-06-23T08:06:23.633+0000 7ff04e2cb700 10 monclient: get_auth_request con 0x55cf14323400 auth_method 0
 -8570> 2024-06-23T08:06:23.633+0000 7ff04e2cb700 10 monclient: get_auth_request con 0x55cf1485e400 auth_method 0
 -8564> 2024-06-23T08:06:23.633+0000 7ff04eacc700 10 monclient: get_auth_request con 0x55cf14322c00 auth_method 0
 -8561> 2024-06-23T08:06:23.633+0000 7ff04f2cd700 10 monclient: get_auth_request con 0x55cf13f12800 auth_method 0
 -8555> 2024-06-23T08:06:23.633+0000 7ff04eacc700 10 monclient: get_auth_request con 0x55cf14f48000 auth_method 0
 -8546> 2024-06-23T08:06:23.633+0000 7ff04f2cd700 10 monclient: get_auth_request con 0x55cf1485f800 auth_method 0
 -8541> 2024-06-23T08:06:23.633+0000 7ff04eacc700 10 monclient: get_auth_request con 0x55cf1516b000 auth_method 0
 -8470> 2024-06-23T08:06:23.634+0000 7ff04e2cb700 10 monclient: get_auth_request con 0x55cf14322400 auth_method 0
 -8451> 2024-06-23T08:06:23.635+0000 7ff04f2cd700 10 monclient: get_auth_request con 0x55cf13f13800 auth_method 0
 -8445> 2024-06-23T08:06:23.635+0000 7ff04eacc700 10 monclient: get_auth_request con 0x55cf1485e800 auth_method 0
 -8243> 2024-06-23T08:06:23.637+0000 7ff04e2cb700 10 monclient: get_auth_request con 0x55cf141ce400 auth_method 0
 -7381> 2024-06-23T08:06:23.645+0000 7ff04f2cd700 10 monclient: get_auth_request con 0x55cf1485f400 auth_method 0
 -6319> 2024-06-23T08:06:23.660+0000 7ff04eacc700 10 monclient: get_auth_request con 0x55cf14f48400 auth_method 0
 -5946> 2024-06-23T08:06:23.666+0000 7ff04e2cb700 10 monclient: get_auth_request con 0x55cf146d1400 auth_method 0
[...]
 -5677> 2024-06-23T08:06:23.668+0000 7ff045aba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10003e45d8b
[...]
    -8> 2024-06-23T08:06:23.753+0000 7ff045aba700  5 mds.beacon.default.cephmon-01.cepqjp set_want_state: up:rejoin -> down:damaged
    -7> 2024-06-23T08:06:23.753+0000 7ff045aba700 10 log_client log_queue is 2 last_log 2 sent 0 num 2 unsent 2 sending 2
    -6> 2024-06-23T08:06:23.753+0000 7ff045aba700 10 log_client  will send 2024-06-23T08:06:23.629743+0000 mds.default.cephmon-01.cepqjp (mds.0) 1 : cluster [ERR] bad backtrace on directory inode 0x10003e42340
    -5> 2024-06-23T08:06:23.753+0000 7ff045aba700 10 log_client  will send 2024-06-23T08:06:23.669673+0000 mds.default.cephmon-01.cepqjp (mds.0) 2 : cluster [ERR] bad backtrace on directory inode 0x10003e45d8b
    -4> 2024-06-23T08:06:23.753+0000 7ff045aba700 10 monclient: _send_mon_message to mon.cephmon-01 at v2:10.1.3.21:3300/0
    -3> 2024-06-23T08:06:23.753+0000 7ff045aba700  5 mds.beacon.default.cephmon-01.cepqjp Sending beacon down:damaged seq 4
    -2> 2024-06-23T08:06:23.753+0000 7ff045aba700 10 monclient: _send_mon_message to mon.cephmon-01 at v2:10.1.3.21:3300/0
    -1> 2024-06-23T08:06:23.773+0000 7ff04e2cb700  5 mds.beacon.default.cephmon-01.cepqjp received beacon reply down:damaged seq 4 rtt 0.0200001
     0> 2024-06-23T08:06:23.773+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp respawn!
--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 rbd_pwl
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 immutable_obj_cache
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 0 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 rgw_sync
   1/ 5 rgw_datacache
   1/ 5 rgw_access
   1/ 5 rgw_dbstore
   1/ 5 rgw_flight
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   1/ 5 fuse
   2/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
   1/ 5 prioritycache
   0/ 5 test
   0/ 5 cephfs_mirror
   0/ 5 cephsqlite
   0/ 5 seastore
   0/ 5 seastore_onode
   0/ 5 seastore_odata
   0/ 5 seastore_omap
   0/ 5 seastore_tm
   0/ 5 seastore_t
   0/ 5 seastore_cleaner
   0/ 5 seastore_epm
   0/ 5 seastore_lba
   0/ 5 seastore_fixedkv_tree
   0/ 5 seastore_cache
   0/ 5 seastore_journal
   0/ 5 seastore_device
   0/ 5 seastore_backref
   0/ 5 alienstore
   1/ 5 mclock
   0/ 5 cyanstore
   1/ 5 ceph_exporter
   1/ 5 memstore
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
--- pthread ID / name mapping for recent threads ---
  7ff045aba700 / MR_Finisher
  7ff04e2cb700 / msgr-worker-2
  7ff04eacc700 / msgr-worker-1
  7ff04f2cd700 / msgr-worker-0
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mds.default.cephmon-01.cepqjp.log
--- end dump of recent events ---
2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  e: '/usr/bin/ceph-mds'
2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  0: '/usr/bin/ceph-mds'
2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  1: '-n'
2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  2: 'mds.default.cephmon-01.cepqjp'
2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  3: '-f'
2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  4: '--setuser'
2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  5: 'ceph'
2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  6: '--setgroup'
2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  7: 'ceph'
2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  8: '--default-log-to-file=false'
2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  9: '--default-log-to-journald=true'
2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  10: '--default-log-to-stderr=false'
2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp respawning with exe /usr/bin/ceph-mds
2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  exe_path /proc/self/exe
2024-06-23T08:06:23.812+0000 7fe58d619b00  0 ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable), process ceph-mds, pid 2
2024-06-23T08:06:23.812+0000 7fe58d619b00  1 main not setting numa affinity
2024-06-23T08:06:23.813+0000 7fe58d619b00  0 pidfile_write: ignore empty --pid-file
2024-06-23T08:06:23.814+0000 7fe58226e700  1 mds.default.cephmon-01.cepqjp Updating MDS map to version 8067 from mon.0
2024-06-23T08:06:24.772+0000 7fe58226e700  1 mds.default.cephmon-01.cepqjp Updating MDS map to version 8068 from mon.0
2024-06-23T08:06:24.772+0000 7fe58226e700  1 mds.default.cephmon-01.cepqjp Monitors have assigned me to become a standby.
2024-06-23T08:49:28.778+0000 7fe584272700  1 mds.default.cephmon-01.cepqjp asok_command: heap {heapcmd=stats,prefix=heap} (starting...)
2024-06-23T22:00:04.664+0000 7fe583a71700 -1 received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0

Any ideas how to proceed?

Would rerunning the cephfs-data-scan sequence do any harm or give us a chance to resolve this?

Would removing "snapBackup_head"  omap keys help to fix the  bad backtrace error?

[ERR] : bad backtrace on directory inode 0x10003e42340

This are the corresponding omapvals:
# rados --cluster ceph -p ssd-rep-metadata-pool listomapvals 10003e42340.00000000
snapBackup_head
value (484 bytes) :
00000000  23 04 00 00 00 00 00 00  49 13 06 b9 01 00 00 41 |#.......I......A|
00000010  23 e4 03 00 01 00 00 00  00 00 00 a0 70 72 66 c6 |#...........prf.|
00000020  ba 9f 32 ed 41 00 00 07  9d 00 00 50 c3 00 00 01 |..2.A......P....|
00000030  00 00 00 00 02 00 00 00  00 00 00 00 02 02 18 00 |................|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 ff ff |................|
00000050  ff ff ff ff ff ff 00 00  00 00 00 00 00 00 00 00 |................|
00000060  00 00 01 00 00 00 ff ff  ff ff ff ff ff ff 00 00 |................|
00000070  00 00 00 00 00 00 00 00  00 00 a0 70 72 66 c6 ba |...........prf..|
00000080  9f 32 a0 70 72 66 75 78  90 32 00 00 00 00 00 00 |.2.prfux.2......|
00000090  00 00 03 02 28 00 00 00  00 00 00 00 00 00 00 00 |....(...........|
000000a0  a0 70 72 66 c6 ba 9f 32  01 00 00 00 00 00 00 00 |.prf...2........|
000000b0  00 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00 |................|
000000c0  03 02 38 00 00 00 00 00  00 00 00 00 00 00 b6 16 |..8.............|
000000d0  00 00 00 00 00 00 01 00  00 00 00 00 00 00 01 00 |................|
000000e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|
000000f0  00 00 00 00 00 00 2a 74  72 66 bc b9 6c 07 03 02 |......*trf..l...|
00000100  38 00 00 00 00 00 00 00  00 00 00 00 b6 16 00 00 |8...............|
00000110  00 00 00 00 01 00 00 00  00 00 00 00 01 00 00 00 |................|
00000120  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|
00000130  00 00 00 00 2a 74 72 66  bc b9 6c 07 26 05 00 00 |....*trf..l.&...|
00000140  00 00 00 00 00 00 00 00  00 00 00 00 01 00 00 00 |................|
00000150  00 00 00 00 02 00 00 00  00 00 00 00 00 00 00 00 |................|
00000160  00 00 00 00 00 00 00 00  ff ff ff ff ff ff ff ff |................|
00000170  00 00 00 00 01 01 10 00  00 00 00 00 00 00 00 00 |................|
00000180  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|
00000190  00 00 00 00 00 00 00 00  00 00 00 00 00 00 a0 70 |...............p|
000001a0  72 66 75 78 90 32 01 00  00 00 00 00 00 00 ff ff |rfux.2..........|
000001b0  ff ff 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|
000001c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|
000001d0  00 00 00 00 00 00 00 00  fe ff ff ff ff ff ff ff |................|
000001e0  00 00 00 00                                       |....|
000001e4

  Thanks for any help

 Dietmar

On 6/19/24 13:42, Dietmar Rieder wrote:
> On 6/19/24 11:15, Dietmar Rieder wrote:
>> On 6/19/24 10:30, Xiubo Li wrote:
>>> 
>>> On 6/19/24 16:13, Dietmar Rieder wrote:
>>>> Hi Xiubo,
>>>> 
>>> [...]
>>>>>> 
>>>>>>      0> 2024-06-19T07:12:39.236+0000 7f90fa912700 -1 *** Caught signal (Aborted) **
>>>>>>  in thread 7f90fa912700 thread_name:md_log_replay
>>>>>> 
>>>>>>  ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
>>>>>>  1: /lib64/libpthread.so.0(+0x12d20) [0x7f910b4d2d20]
>>>>>>  2: gsignal()
>>>>>>  3: abort()
>>>>>>  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x18f) [0x7f910c722e6f]
>>>>>>  5: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f910c722fdb]
>>>>>>  6: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t, std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x55a93c0de9a5]
>>>>>>  7: (EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*)+0x4207) [0x55a93c3e76e7]
>>>>>>  8: (EUpdate::replay(MDSRank*)+0x61) [0x55a93c3e9f81]
>>>>>>  9: (MDLog::_replay_thread()+0x6c9) [0x55a93c3701d9]
>>>>>>  10: (MDLog::ReplayThread::entry()+0x11) [0x55a93c01e2d1]
>>>>>>  11: /lib64/libpthread.so.0(+0x81ca) [0x7f910b4c81ca]
>>>>>>  12: clone()
>>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>>>>>> 
>>>>> This is a known bug, please see https://tracker.ceph.com/issues/61009.
>>>>> 
>>>>> As a workaround I am afraid you need to trim the journal logs first and then try to restart the MDS daemons, And at the same time please follow the workaround in https://tracker.ceph.com/issues/61009#note-26
>>>> 
>>>> I see, I'll try to do this. Are there any caveats or issues to expect by trimming the journal logs?
>>>> 
>>> Certainly you will lose the dirty metadata in the journals.
>>> 
>>>> Is there a step by step guide on how to perform the trimming? Should all MDS be stopped before?
>>>> 
>>> Please follow https://docs.ceph.com/en/nautilus/cephfs/disaster-recovery-experts/#disaster-recovery-experts.
>> 
>> OK, when I run the cephfs-journal-tool I get an error:
>> 
>> # cephfs-journal-tool journal export backup.bin
>> Error ((22) Invalid argument)
>> 
>> My cluster is managed by caphadm, so (in my stress situation) I'm not able find the correct way to use cephfs-journal-tool
>> 
>> I'm sure it is something stupid that I'm missing but I'd be happy for any hint.
> 
> I ran the disaster recovery procedures now, as follows:
> 
> [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary
> Events by type:
>    OPEN: 8737
>    PURGED: 1
>    SESSION: 9
>    SESSIONS: 2
>    SUBTREEMAP: 128
>    TABLECLIENT: 2
>    TABLESERVER: 30
>    UPDATE: 9207
> Errors: 0
> [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:1 event recover_dentries summary
> Events by type:
>    OPEN: 3
>    SESSION: 1
>    SUBTREEMAP: 34
>    UPDATE: 32965
> Errors: 0
> [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:2 event recover_dentries summary
> Events by type:
>    OPEN: 5289
>    SESSION: 10
>    SESSIONS: 3
>    SUBTREEMAP: 128
>    UPDATE: 76448
> Errors: 0
> 
> 
> [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:all  journal inspect
> Overall journal integrity: OK
> Overall journal integrity: DAMAGED
> Corrupt regions:
>    0xd9a84f243c-ffffffffffffffff
> Overall journal integrity: OK
> [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:0  journal inspect
> Overall journal integrity: OK
> [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:1  journal inspect
> Overall journal integrity: DAMAGED
> Corrupt regions:
>    0xd9a84f243c-ffffffffffffffff
> [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:2  journal inspect
> Overall journal integrity: OK
> 
> 
> [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:0 journal reset
> old journal was 879331755046~508520587
> new journal start will be 879843344384 (3068751 bytes past old end)
> writing journal head
> writing EResetJournal entry
> done
> [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:1 journal reset
> old journal was 934711229813~120432327
> new journal start will be 934834864128 (3201988 bytes past old end)
> writing journal head
> writing EResetJournal entry
> done
> [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:2 journal reset
> old journal was 1334153584288~252692691
> new journal start will be 1334409428992 (3152013 bytes past old end)
> writing journal head
> writing EResetJournal entry
> done
> 
> [root@ceph01-b /]# cephfs-table-tool all reset session
> {
>      "0": {
>          "data": {},
>          "result": 0
>      },
>      "1": {
>          "data": {},
>          "result": 0
>      },
>      "2": {
>          "data": {},
>          "result": 0
>      }
> }
> 
> 
> [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:1  journal inspect
> Overall journal integrity: OK
> 
> 
> [root@ceph01-b /]# ceph fs reset cephfs --yes-i-really-mean-it
> 
> 
> 
> But now I hit the error below:
> 
> 
>     -20> 2024-06-19T11:13:00.610+0000 7ff3694d0700 10 monclient: _send_mon_message to mon.cephmon-03 at v2:10.1.3.23:3300/0
>     -19> 2024-06-19T11:13:00.637+0000 7ff3664ca700  2 mds.0.cache Memory usage:  total 485928, rss 170860, heap 207156, baseline 182580, 0 / 33434 inodes have caps, 0 caps, 0 caps per inode
>     -18> 2024-06-19T11:13:00.787+0000 7ff36a4d2700  1 mds.default.cephmon-03.chjusj Updating MDS map to version 8061 from mon.1
>     -17> 2024-06-19T11:13:00.787+0000 7ff36a4d2700  1 mds.0.8058 handle_mds_map i am now mds.0.8058
>     -16> 2024-06-19T11:13:00.787+0000 7ff36a4d2700  1 mds.0.8058 handle_mds_map state change up:rejoin --> up:active
>     -15> 2024-06-19T11:13:00.787+0000 7ff36a4d2700  1 mds.0.8058 recovery_done -- successful recovery!
>     -14> 2024-06-19T11:13:00.788+0000 7ff36a4d2700  1 mds.0.8058 active_start
>     -13> 2024-06-19T11:13:00.789+0000 7ff36dcd9700  5 mds.beacon.default.cephmon-03.chjusj received beacon reply up:active seq 4 rtt 0.955007
>     -12> 2024-06-19T11:13:00.790+0000 7ff36a4d2700  1 mds.0.8058 cluster recovered.
>     -11> 2024-06-19T11:13:00.790+0000 7ff36a4d2700  4 mds.0.8058 set_osd_epoch_barrier: epoch=33596
>     -10> 2024-06-19T11:13:00.790+0000 7ff3634c4700  5 mds.0.log _submit_thread 879843344432~2609 : EUpdate check_inode_max_size [metablob 0x100, 2 dirs]
>      -9> 2024-06-19T11:13:00.791+0000 7ff3644c6700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/mds/MDCache.cc: In function 'void MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)' thread 7ff3644c6700 time 2024-06-19T11:13:00.791580+0000
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/mds/MDCache.cc: 1660: FAILED ceph_assert(follows >= realm->get_newest_seq())
> 
>   ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x135) [0x7ff374ad3e15]
>   2: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7ff374ad3fdb]
>   3: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)+0x13c7) [0x55da0a7aa227]
>   4: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*, snapid_t)+0xc5) [0x55da0a7aa3a5]
>   5: (Locker::check_inode_max_size(CInode*, bool, unsigned long, unsigned long, utime_t)+0x84d) [0x55da0a88ce3d]
>   6: (RecoveryQueue::_recovered(CInode*, int, unsigned long, utime_t)+0x4f0) [0x55da0a85ad50]
>   7: (MDSContext::complete(int)+0x5f) [0x55da0a9ddeef]
>   8: (MDSIOContextBase::complete(int)+0x524) [0x55da0a9de674]
>   9: (Filer::C_Probe::finish(int)+0xbb) [0x55da0aa9dc9b]
>   10: (Context::complete(int)+0xd) [0x55da0a6775fd]
>   11: (Finisher::finisher_thread_entry()+0x18d) [0x7ff374b77abd]
>   12: /lib64/libpthread.so.0(+0x81ca) [0x7ff3738791ca]
>   13: clone()
> 
>      -8> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client handle_log_ack log(last 7) v1
>      -7> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client  logged 2024-06-19T11:12:59.647346+0000 mds.default.cephmon-03.chjusj (mds.0) 1 : cluster [ERR] loaded dup inode 0x10003e45d99 [415,head] v61632 at /home/balaz/.bash_history-54696.tmp, but inode 0x10003e45d99.head v61639 already exists at /home/balaz/.bash_history
>      -6> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client  logged 2024-06-19T11:12:59.648139+0000 mds.default.cephmon-03.chjusj (mds.0) 2 : cluster [ERR] loaded dup inode 0x10003e45d7c [415,head] v253612 at /home/rieder/.bash_history-10215.tmp, but inode 0x10003e45d7c.head v253630 already exists at /home/rieder/.bash_history
>      -5> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client  logged 2024-06-19T11:12:59.649483+0000 mds.default.cephmon-03.chjusj (mds.0) 3 : cluster [ERR] loaded dup inode 0x10003e45d83 [415,head] v164103 at /home/gottschling/.bash_history-44802.tmp, but inode 0x10003e45d83.head v164112 already exists at /home/gottschling/.bash_history
>      -4> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client  logged 2024-06-19T11:12:59.656221+0000 mds.default.cephmon-03.chjusj (mds.0) 4 : cluster [ERR] bad backtrace on directory inode 0x10003e42340
>      -3> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client  logged 2024-06-19T11:12:59.737282+0000 mds.default.cephmon-03.chjusj (mds.0) 5 : cluster [ERR] bad backtrace on directory inode 0x10003e45d8b
>      -2> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client  logged 2024-06-19T11:12:59.804984+0000 mds.default.cephmon-03.chjusj (mds.0) 6 : cluster [ERR] bad backtrace on directory inode 0x10003e45d9f
>      -1> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client  logged 2024-06-19T11:12:59.805078+0000 mds.default.cephmon-03.chjusj (mds.0) 7 : cluster [ERR] bad backtrace on directory inode 0x10003e45d90
>       0> 2024-06-19T11:13:00.792+0000 7ff3644c6700 -1 *** Caught signal (Aborted) **
>   in thread 7ff3644c6700 thread_name:MR_Finisher
> 
>   ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
>   1: /lib64/libpthread.so.0(+0x12d20) [0x7ff373883d20]
>   2: gsignal()
>   3: abort()
>   4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x18f) [0x7ff374ad3e6f]
>   5: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7ff374ad3fdb]
>   6: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)+0x13c7) [0x55da0a7aa227]
>   7: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*, snapid_t)+0xc5) [0x55da0a7aa3a5]
>   8: (Locker::check_inode_max_size(CInode*, bool, unsigned long, unsigned long, utime_t)+0x84d) [0x55da0a88ce3d]
>   9: (RecoveryQueue::_recovered(CInode*, int, unsigned long, utime_t)+0x4f0) [0x55da0a85ad50]
>   10: (MDSContext::complete(int)+0x5f) [0x55da0a9ddeef]
>   11: (MDSIOContextBase::complete(int)+0x524) [0x55da0a9de674]
>   12: (Filer::C_Probe::finish(int)+0xbb) [0x55da0aa9dc9b]
>   13: (Context::complete(int)+0xd) [0x55da0a6775fd]
>   14: (Finisher::finisher_thread_entry()+0x18d) [0x7ff374b77abd]
>   15: /lib64/libpthread.so.0(+0x81ca) [0x7ff3738791ca]
>   16: clone()
>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
> 
> --- logging levels ---
>     0/ 5 none
>     0/ 1 lockdep
>     0/ 1 context
>     1/ 1 crush
>     1/ 5 mds
>     1/ 5 mds_balancer
>     1/ 5 mds_locker
>     1/ 5 mds_log
>     1/ 5 mds_log_expire
>     1/ 5 mds_migrator
>     0/ 1 buffer
>     0/ 1 timer
>     0/ 1 filer
>     0/ 1 striper
>     0/ 1 objecter
>     0/ 5 rados
>     0/ 5 rbd
>     0/ 5 rbd_mirror
>     0/ 5 rbd_replay
>     0/ 5 rbd_pwl
>     0/ 5 journaler
>     0/ 5 objectcacher
>     0/ 5 immutable_obj_cache
>     0/ 5 client
>     1/ 5 osd
>     0/ 5 optracker
>     0/ 5 objclass
>     1/ 3 filestore
>     1/ 3 journal
>     0/ 0 ms
>     1/ 5 mon
>     0/10 monc
>     1/ 5 paxos
>     0/ 5 tp
>     1/ 5 auth
>     1/ 5 crypto
>     1/ 1 finisher
>     1/ 1 reserver
>     1/ 5 heartbeatmap
>     1/ 5 perfcounter
>     1/ 5 rgw
>     1/ 5 rgw_sync
>     1/ 5 rgw_datacache
>     1/ 5 rgw_access
>     1/ 5 rgw_dbstore
>     1/ 5 rgw_flight
>     1/ 5 javaclient
>     1/ 5 asok
>     1/ 1 throttle
>     0/ 0 refs
>     1/ 5 compressor
>     1/ 5 bluestore
>     1/ 5 bluefs
>     1/ 3 bdev
>     1/ 5 kstore
>     4/ 5 rocksdb
>     4/ 5 leveldb
>     1/ 5 fuse
>     2/ 5 mgr
>     1/ 5 mgrc
>     1/ 5 dpdk
>     1/ 5 eventtrace
>     1/ 5 prioritycache
>     0/ 5 test
>     0/ 5 cephfs_mirror
>     0/ 5 cephsqlite
>     0/ 5 seastore
>     0/ 5 seastore_onode
>     0/ 5 seastore_odata
>     0/ 5 seastore_omap
>     0/ 5 seastore_tm
>     0/ 5 seastore_t
>     0/ 5 seastore_cleaner
>     0/ 5 seastore_epm
>     0/ 5 seastore_lba
>     0/ 5 seastore_fixedkv_tree
>     0/ 5 seastore_cache
>     0/ 5 seastore_journal
>     0/ 5 seastore_device
>     0/ 5 seastore_backref
>     0/ 5 alienstore
>     1/ 5 mclock
>     0/ 5 cyanstore
>     1/ 5 ceph_exporter
>     1/ 5 memstore
>    -2/-2 (syslog threshold)
>    -1/-1 (stderr threshold)
> --- pthread ID / name mapping for recent threads ---
>    7ff362cc3700 /
>    7ff3634c4700 / md_submit
>    7ff363cc5700 /
>    7ff3644c6700 / MR_Finisher
>    7ff3654c8700 / PQ_Finisher
>    7ff365cc9700 / mds_rank_progr
>    7ff3664ca700 / ms_dispatch
>    7ff3684ce700 / ceph-mds
>    7ff3694d0700 / safe_timer
>    7ff36a4d2700 / ms_dispatch
>    7ff36b4d4700 / io_context_pool
>    7ff36c4d6700 / admin_socket
>    7ff36ccd7700 / msgr-worker-2
>    7ff36d4d8700 / msgr-worker-1
>    7ff36dcd9700 / msgr-worker-0
>    7ff375c9bb00 / ceph-mds
>    max_recent     10000
>    max_new         1000
>    log_file /var/log/ceph/ceph-mds.default.cephmon-03.chjusj.log
> --- end dump of recent events ---
> 
> Any idea?
> 
> Thanks
> 
> Dietmar
> 
>  > [...]
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

-- 
_________________________________________________________
D i e t m a r  R i e d e r
Innsbruck Medical University
Biocenter - Institute of Bioinformatics
Innrain 80, 6020 Innsbruck
Phone: +43 512 9003 71402 | Mobile: +43 676 8716 72402
Email: dietmar.rieder@xxxxxxxxxxx
Web:   http://www.icbi.at

-- 
_______________________________________________
D i e t m a r  R i e d e r, Mag.Dr.
Head of Bioinformatics Core Facility
Innsbruck Medical University
Biocenter - Institute of Bioinformatics
Innrain 80, 6020 Innsbruck
Phone: +43 512 9003 71402
Mobile: +43 676 8716 72402
Fax: +43 512 9003 74400
Email: dietmar.rieder@xxxxxxxxxxx
Web:   http://www.icbi.at
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx