(resending this, the original message seems that it didn't make it through between all the SPAM recently sent to the list, my apologies if it doubles at some point) Hi List, we are still struggeling to get our cephfs back online again, this is an update to inform you what we did so far, and we kindly ask for any input on this to get an idea on how to proceed: After resetting the journals Xiubo suggested (in a PM) to go on with the disaster recovery procedure: cephfs-data-scan init skipped creating the inodes 0x0x1 and 0x0x100 [root@ceph01-b ~]# cephfs-data-scan init Inode 0x0x1 already exists, skipping create. Use --force-init to overwrite the existing object. Inode 0x0x100 already exists, skipping create. Use --force-init to overwrite the existing object. We did not use --force-init and proceeded with scan_extents using a single worker, which was indeed very slow. After ~24h we interupted the scan_extents and restarted it with 32 workers which went through in about 2h15min w/o any issue. Then I started scan_inodes with 32 workers this was also finished after ~50min no output on stderr or stdout. I went on with scan_links, which after ~45 minutes threw the following error: # cephfs-data-scan scan_links Error ((2) No such file or directory) then "cephfs-data-scan cleanup" went through w/o any message and took about 9hrs 20min. Unfortunately, when starting the MDS the cephfs seems still to be in damage. I get quite some "loaded already corrupt dentry:" messages and 2 "[ERR] : bad backtrace on directory inode" errors: (In the following log I removed almost all "loaded already corrupt dentry" entries, for clarity reasons) 2024-06-23T08:06:20.934+0000 7ff05728fb00 0 set uid:gid to 167:167 (ceph:ceph) 2024-06-23T08:06:20.934+0000 7ff05728fb00 0 ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable), process ceph-mds, pid 2 2024-06-23T08:06:20.934+0000 7ff05728fb00 1 main not setting numa affinity 2024-06-23T08:06:20.934+0000 7ff05728fb00 0 pidfile_write: ignore empty --pid-file 2024-06-23T08:06:20.936+0000 7ff04bac6700 1 mds.default.cephmon-01.cepqjp Updating MDS map to version 8062 from mon.0 2024-06-23T08:06:21.583+0000 7ff04bac6700 1 mds.default.cephmon-01.cepqjp Updating MDS map to version 8063 from mon.0 2024-06-23T08:06:21.583+0000 7ff04bac6700 1 mds.default.cephmon-01.cepqjp Monitors have assigned me to become a standby. 2024-06-23T08:06:21.604+0000 7ff04bac6700 1 mds.default.cephmon-01.cepqjp Updating MDS map to version 8064 from mon.0 2024-06-23T08:06:21.604+0000 7ff04bac6700 1 mds.0.8064 handle_mds_map i am now mds.0.8064 2024-06-23T08:06:21.604+0000 7ff04bac6700 1 mds.0.8064 handle_mds_map state change up:standby --> up:replay 2024-06-23T08:06:21.604+0000 7ff04bac6700 1 mds.0.8064 replay_start 2024-06-23T08:06:21.604+0000 7ff04bac6700 1 mds.0.8064 waiting for osdmap 34327 (which blocklists prior instance) 2024-06-23T08:06:21.627+0000 7ff0452b9700 0 mds.0.cache creating system inode with ino:0x100 2024-06-23T08:06:21.627+0000 7ff0452b9700 0 mds.0.cache creating system inode with ino:0x1 2024-06-23T08:06:21.636+0000 7ff0442b7700 1 mds.0.journal EResetJournal 2024-06-23T08:06:21.636+0000 7ff0442b7700 1 mds.0.sessionmap wipe start 2024-06-23T08:06:21.636+0000 7ff0442b7700 1 mds.0.sessionmap wipe result 2024-06-23T08:06:21.636+0000 7ff0442b7700 1 mds.0.sessionmap wipe done 2024-06-23T08:06:21.656+0000 7ff045aba700 1 mds.0.8064 Finished replaying journal 2024-06-23T08:06:21.656+0000 7ff045aba700 1 mds.0.8064 making mds journal writeable 2024-06-23T08:06:22.604+0000 7ff04bac6700 1 mds.default.cephmon-01.cepqjp Updating MDS map to version 8065 from mon.0 2024-06-23T08:06:22.604+0000 7ff04bac6700 1 mds.0.8064 handle_mds_map i am now mds.0.8064 2024-06-23T08:06:22.604+0000 7ff04bac6700 1 mds.0.8064 handle_mds_map state change up:replay --> up:reconnect 2024-06-23T08:06:22.604+0000 7ff04bac6700 1 mds.0.8064 reconnect_start 2024-06-23T08:06:22.604+0000 7ff04bac6700 1 mds.0.8064 reopen_log 2024-06-23T08:06:22.605+0000 7ff04bac6700 1 mds.0.8064 reconnect_done 2024-06-23T08:06:23.605+0000 7ff04bac6700 1 mds.default.cephmon-01.cepqjp Updating MDS map to version 8066 from mon.0 2024-06-23T08:06:23.605+0000 7ff04bac6700 1 mds.0.8064 handle_mds_map i am now mds.0.8064 2024-06-23T08:06:23.605+0000 7ff04bac6700 1 mds.0.8064 handle_mds_map state change up:reconnect --> up:rejoin 2024-06-23T08:06:23.605+0000 7ff04bac6700 1 mds.0.8064 rejoin_start 2024-06-23T08:06:23.609+0000 7ff04bac6700 1 mds.0.8064 rejoin_joint_start 2024-06-23T08:06:23.611+0000 7ff045aba700 1 mds.0.cache.den(0x10000000000 groups) loaded already corrupt dentry: [dentry #0x1/data/groups [bf,head] rep@0.0 NULL (dversion lock) pv=0 v= 7910497 ino=(nil) state=0 0x55cf13e9f400] 2024-06-23T08:06:23.611+0000 7ff045aba700 1 mds.0.cache.den(0x1000192ec16.1* scad_prj) loaded already corrupt dentry: [dentry #0x1/home/scad_prj [159,head] rep@0.0 NULL (dversion lock) pv=0 v=2462060 ino=(nil) state=0 0x55cf14220a00] [...] 2024-06-23T08:06:23.628+0000 7ff045aba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10003e42340 2024-06-23T08:06:23.668+0000 7ff045aba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10003e45d8b [...] 2024-06-23T08:06:23.773+0000 7ff045aba700 1 mds.default.cephmon-01.cepqjp respawn! --- begin dump of recent events --- -9999> 2024-06-23T08:06:23.615+0000 7ff045aba700 1 mds.0.cache.den(0x1000321976d jupyterhub_slurmspawner_67002.log) loaded already corrupt dentry: [dentry #0x1/home/michelotto/jupyterhub_slurmspawner_67002.log [239,36c] rep@0.0 NULL (dversion lock) pv=0 v=26855917 ino=(nil) state=0 0x55cf147ff680] [...] -9878> 2024-06-23T08:06:23.616+0000 7ff04eacc700 10 monclient: get_auth_request con 0x55cf13f13c00 auth_method 0 [...] -9877> 2024-06-23T08:06:23.616+0000 7ff04e2cb700 10 monclient: get_auth_request con 0x55cf146d1800 auth_method 0 -9458> 2024-06-23T08:06:23.620+0000 7ff04f2cd700 10 monclient: get_auth_request con 0x55cf13f13400 auth_method 0 -9353> 2024-06-23T08:06:23.622+0000 7ff04eacc700 10 monclient: get_auth_request con 0x55cf146d0800 auth_method 0 -8980> 2024-06-23T08:06:23.625+0000 7ff04e2cb700 10 monclient: get_auth_request con 0x55cf14d9f400 auth_method 0 -8978> 2024-06-23T08:06:23.625+0000 7ff04f2cd700 10 monclient: get_auth_request con 0x55cf14d9fc00 auth_method 0 -8849> 2024-06-23T08:06:23.628+0000 7ff045aba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10003e42340 -8574> 2024-06-23T08:06:23.633+0000 7ff04e2cb700 10 monclient: get_auth_request con 0x55cf14323400 auth_method 0 -8570> 2024-06-23T08:06:23.633+0000 7ff04e2cb700 10 monclient: get_auth_request con 0x55cf1485e400 auth_method 0 -8564> 2024-06-23T08:06:23.633+0000 7ff04eacc700 10 monclient: get_auth_request con 0x55cf14322c00 auth_method 0 -8561> 2024-06-23T08:06:23.633+0000 7ff04f2cd700 10 monclient: get_auth_request con 0x55cf13f12800 auth_method 0 -8555> 2024-06-23T08:06:23.633+0000 7ff04eacc700 10 monclient: get_auth_request con 0x55cf14f48000 auth_method 0 -8546> 2024-06-23T08:06:23.633+0000 7ff04f2cd700 10 monclient: get_auth_request con 0x55cf1485f800 auth_method 0 -8541> 2024-06-23T08:06:23.633+0000 7ff04eacc700 10 monclient: get_auth_request con 0x55cf1516b000 auth_method 0 -8470> 2024-06-23T08:06:23.634+0000 7ff04e2cb700 10 monclient: get_auth_request con 0x55cf14322400 auth_method 0 -8451> 2024-06-23T08:06:23.635+0000 7ff04f2cd700 10 monclient: get_auth_request con 0x55cf13f13800 auth_method 0 -8445> 2024-06-23T08:06:23.635+0000 7ff04eacc700 10 monclient: get_auth_request con 0x55cf1485e800 auth_method 0 -8243> 2024-06-23T08:06:23.637+0000 7ff04e2cb700 10 monclient: get_auth_request con 0x55cf141ce400 auth_method 0 -7381> 2024-06-23T08:06:23.645+0000 7ff04f2cd700 10 monclient: get_auth_request con 0x55cf1485f400 auth_method 0 -6319> 2024-06-23T08:06:23.660+0000 7ff04eacc700 10 monclient: get_auth_request con 0x55cf14f48400 auth_method 0 -5946> 2024-06-23T08:06:23.666+0000 7ff04e2cb700 10 monclient: get_auth_request con 0x55cf146d1400 auth_method 0 [...] -5677> 2024-06-23T08:06:23.668+0000 7ff045aba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10003e45d8b [...] -8> 2024-06-23T08:06:23.753+0000 7ff045aba700 5 mds.beacon.default.cephmon-01.cepqjp set_want_state: up:rejoin -> down:damaged -7> 2024-06-23T08:06:23.753+0000 7ff045aba700 10 log_client log_queue is 2 last_log 2 sent 0 num 2 unsent 2 sending 2 -6> 2024-06-23T08:06:23.753+0000 7ff045aba700 10 log_client will send 2024-06-23T08:06:23.629743+0000 mds.default.cephmon-01.cepqjp (mds.0) 1 : cluster [ERR] bad backtrace on directory inode 0x10003e42340 -5> 2024-06-23T08:06:23.753+0000 7ff045aba700 10 log_client will send 2024-06-23T08:06:23.669673+0000 mds.default.cephmon-01.cepqjp (mds.0) 2 : cluster [ERR] bad backtrace on directory inode 0x10003e45d8b -4> 2024-06-23T08:06:23.753+0000 7ff045aba700 10 monclient: _send_mon_message to mon.cephmon-01 at v2:10.1.3.21:3300/0 -3> 2024-06-23T08:06:23.753+0000 7ff045aba700 5 mds.beacon.default.cephmon-01.cepqjp Sending beacon down:damaged seq 4 -2> 2024-06-23T08:06:23.753+0000 7ff045aba700 10 monclient: _send_mon_message to mon.cephmon-01 at v2:10.1.3.21:3300/0 -1> 2024-06-23T08:06:23.773+0000 7ff04e2cb700 5 mds.beacon.default.cephmon-01.cepqjp received beacon reply down:damaged seq 4 rtt 0.0200001 0> 2024-06-23T08:06:23.773+0000 7ff045aba700 1 mds.default.cephmon-01.cepqjp respawn! --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 rbd_pwl 0/ 5 journaler 0/ 5 objectcacher 0/ 5 immutable_obj_cache 0/ 5 client 1/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 0 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 1 reserver 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 rgw_sync 1/ 5 rgw_datacache 1/ 5 rgw_access 1/ 5 rgw_dbstore 1/ 5 rgw_flight 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 compressor 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 1/ 5 fuse 2/ 5 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace 1/ 5 prioritycache 0/ 5 test 0/ 5 cephfs_mirror 0/ 5 cephsqlite 0/ 5 seastore 0/ 5 seastore_onode 0/ 5 seastore_odata 0/ 5 seastore_omap 0/ 5 seastore_tm 0/ 5 seastore_t 0/ 5 seastore_cleaner 0/ 5 seastore_epm 0/ 5 seastore_lba 0/ 5 seastore_fixedkv_tree 0/ 5 seastore_cache 0/ 5 seastore_journal 0/ 5 seastore_device 0/ 5 seastore_backref 0/ 5 alienstore 1/ 5 mclock 0/ 5 cyanstore 1/ 5 ceph_exporter 1/ 5 memstore -2/-2 (syslog threshold) -1/-1 (stderr threshold) --- pthread ID / name mapping for recent threads --- 7ff045aba700 / MR_Finisher 7ff04e2cb700 / msgr-worker-2 7ff04eacc700 / msgr-worker-1 7ff04f2cd700 / msgr-worker-0 max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-mds.default.cephmon-01.cepqjp.log --- end dump of recent events --- 2024-06-23T08:06:23.786+0000 7ff045aba700 1 mds.default.cephmon-01.cepqjp e: '/usr/bin/ceph-mds' 2024-06-23T08:06:23.786+0000 7ff045aba700 1 mds.default.cephmon-01.cepqjp 0: '/usr/bin/ceph-mds' 2024-06-23T08:06:23.786+0000 7ff045aba700 1 mds.default.cephmon-01.cepqjp 1: '-n' 2024-06-23T08:06:23.786+0000 7ff045aba700 1 mds.default.cephmon-01.cepqjp 2: 'mds.default.cephmon-01.cepqjp' 2024-06-23T08:06:23.786+0000 7ff045aba700 1 mds.default.cephmon-01.cepqjp 3: '-f' 2024-06-23T08:06:23.786+0000 7ff045aba700 1 mds.default.cephmon-01.cepqjp 4: '--setuser' 2024-06-23T08:06:23.786+0000 7ff045aba700 1 mds.default.cephmon-01.cepqjp 5: 'ceph' 2024-06-23T08:06:23.786+0000 7ff045aba700 1 mds.default.cephmon-01.cepqjp 6: '--setgroup' 2024-06-23T08:06:23.786+0000 7ff045aba700 1 mds.default.cephmon-01.cepqjp 7: 'ceph' 2024-06-23T08:06:23.786+0000 7ff045aba700 1 mds.default.cephmon-01.cepqjp 8: '--default-log-to-file=false' 2024-06-23T08:06:23.786+0000 7ff045aba700 1 mds.default.cephmon-01.cepqjp 9: '--default-log-to-journald=true' 2024-06-23T08:06:23.786+0000 7ff045aba700 1 mds.default.cephmon-01.cepqjp 10: '--default-log-to-stderr=false' 2024-06-23T08:06:23.786+0000 7ff045aba700 1 mds.default.cephmon-01.cepqjp respawning with exe /usr/bin/ceph-mds 2024-06-23T08:06:23.786+0000 7ff045aba700 1 mds.default.cephmon-01.cepqjp exe_path /proc/self/exe 2024-06-23T08:06:23.812+0000 7fe58d619b00 0 ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable), process ceph-mds, pid 2 2024-06-23T08:06:23.812+0000 7fe58d619b00 1 main not setting numa affinity 2024-06-23T08:06:23.813+0000 7fe58d619b00 0 pidfile_write: ignore empty --pid-file 2024-06-23T08:06:23.814+0000 7fe58226e700 1 mds.default.cephmon-01.cepqjp Updating MDS map to version 8067 from mon.0 2024-06-23T08:06:24.772+0000 7fe58226e700 1 mds.default.cephmon-01.cepqjp Updating MDS map to version 8068 from mon.0 2024-06-23T08:06:24.772+0000 7fe58226e700 1 mds.default.cephmon-01.cepqjp Monitors have assigned me to become a standby. 2024-06-23T08:49:28.778+0000 7fe584272700 1 mds.default.cephmon-01.cepqjp asok_command: heap {heapcmd=stats,prefix=heap} (starting...) 2024-06-23T22:00:04.664+0000 7fe583a71700 -1 received signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0 Any ideas how to proceed? Would rerunning the cephfs-data-scan sequence do any harm or give us a chance to resolve this? Would removing "snapBackup_head" omap keys help to fix the bad backtrace error? [ERR] : bad backtrace on directory inode 0x10003e42340 This are the corresponding omapvals: # rados --cluster ceph -p ssd-rep-metadata-pool listomapvals 10003e42340.00000000 snapBackup_head value (484 bytes) : 00000000 23 04 00 00 00 00 00 00 49 13 06 b9 01 00 00 41 |#.......I......A| 00000010 23 e4 03 00 01 00 00 00 00 00 00 a0 70 72 66 c6 |#...........prf.| 00000020 ba 9f 32 ed 41 00 00 07 9d 00 00 50 c3 00 00 01 |..2.A......P....| 00000030 00 00 00 00 02 00 00 00 00 00 00 00 02 02 18 00 |................| 00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff |................| 00000050 ff ff ff ff ff ff 00 00 00 00 00 00 00 00 00 00 |................| 00000060 00 00 01 00 00 00 ff ff ff ff ff ff ff ff 00 00 |................| 00000070 00 00 00 00 00 00 00 00 00 00 a0 70 72 66 c6 ba |...........prf..| 00000080 9f 32 a0 70 72 66 75 78 90 32 00 00 00 00 00 00 |.2.prfux.2......| 00000090 00 00 03 02 28 00 00 00 00 00 00 00 00 00 00 00 |....(...........| 000000a0 a0 70 72 66 c6 ba 9f 32 01 00 00 00 00 00 00 00 |.prf...2........| 000000b0 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................| 000000c0 03 02 38 00 00 00 00 00 00 00 00 00 00 00 b6 16 |..8.............| 000000d0 00 00 00 00 00 00 01 00 00 00 00 00 00 00 01 00 |................| 000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000f0 00 00 00 00 00 00 2a 74 72 66 bc b9 6c 07 03 02 |......*trf..l...| 00000100 38 00 00 00 00 00 00 00 00 00 00 00 b6 16 00 00 |8...............| 00000110 00 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 |................| 00000120 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000130 00 00 00 00 2a 74 72 66 bc b9 6c 07 26 05 00 00 |....*trf..l.&...| 00000140 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 |................| 00000150 00 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 |................| 00000160 00 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff |................| 00000170 00 00 00 00 01 01 10 00 00 00 00 00 00 00 00 00 |................| 00000180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0 70 |...............p| 000001a0 72 66 75 78 90 32 01 00 00 00 00 00 00 00 ff ff |rfux.2..........| 000001b0 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000001c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000001d0 00 00 00 00 00 00 00 00 fe ff ff ff ff ff ff ff |................| 000001e0 00 00 00 00 |....| 000001e4 Thanks for any help Dietmar On 6/19/24 13:42, Dietmar Rieder wrote: > On 6/19/24 11:15, Dietmar Rieder wrote: >> On 6/19/24 10:30, Xiubo Li wrote: >>> >>> On 6/19/24 16:13, Dietmar Rieder wrote: >>>> Hi Xiubo, >>>> >>> [...] >>>>>> >>>>>> 0> 2024-06-19T07:12:39.236+0000 7f90fa912700 -1 *** Caught signal (Aborted) ** >>>>>> in thread 7f90fa912700 thread_name:md_log_replay >>>>>> >>>>>> ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable) >>>>>> 1: /lib64/libpthread.so.0(+0x12d20) [0x7f910b4d2d20] >>>>>> 2: gsignal() >>>>>> 3: abort() >>>>>> 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x18f) [0x7f910c722e6f] >>>>>> 5: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f910c722fdb] >>>>>> 6: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t, std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x55a93c0de9a5] >>>>>> 7: (EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*)+0x4207) [0x55a93c3e76e7] >>>>>> 8: (EUpdate::replay(MDSRank*)+0x61) [0x55a93c3e9f81] >>>>>> 9: (MDLog::_replay_thread()+0x6c9) [0x55a93c3701d9] >>>>>> 10: (MDLog::ReplayThread::entry()+0x11) [0x55a93c01e2d1] >>>>>> 11: /lib64/libpthread.so.0(+0x81ca) [0x7f910b4c81ca] >>>>>> 12: clone() >>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. >>>>>> >>>>> This is a known bug, please see https://tracker.ceph.com/issues/61009. >>>>> >>>>> As a workaround I am afraid you need to trim the journal logs first and then try to restart the MDS daemons, And at the same time please follow the workaround in https://tracker.ceph.com/issues/61009#note-26 >>>> >>>> I see, I'll try to do this. Are there any caveats or issues to expect by trimming the journal logs? >>>> >>> Certainly you will lose the dirty metadata in the journals. >>> >>>> Is there a step by step guide on how to perform the trimming? Should all MDS be stopped before? >>>> >>> Please follow https://docs.ceph.com/en/nautilus/cephfs/disaster-recovery-experts/#disaster-recovery-experts. >> >> OK, when I run the cephfs-journal-tool I get an error: >> >> # cephfs-journal-tool journal export backup.bin >> Error ((22) Invalid argument) >> >> My cluster is managed by caphadm, so (in my stress situation) I'm not able find the correct way to use cephfs-journal-tool >> >> I'm sure it is something stupid that I'm missing but I'd be happy for any hint. > > I ran the disaster recovery procedures now, as follows: > > [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary > Events by type: > OPEN: 8737 > PURGED: 1 > SESSION: 9 > SESSIONS: 2 > SUBTREEMAP: 128 > TABLECLIENT: 2 > TABLESERVER: 30 > UPDATE: 9207 > Errors: 0 > [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:1 event recover_dentries summary > Events by type: > OPEN: 3 > SESSION: 1 > SUBTREEMAP: 34 > UPDATE: 32965 > Errors: 0 > [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:2 event recover_dentries summary > Events by type: > OPEN: 5289 > SESSION: 10 > SESSIONS: 3 > SUBTREEMAP: 128 > UPDATE: 76448 > Errors: 0 > > > [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:all journal inspect > Overall journal integrity: OK > Overall journal integrity: DAMAGED > Corrupt regions: > 0xd9a84f243c-ffffffffffffffff > Overall journal integrity: OK > [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:0 journal inspect > Overall journal integrity: OK > [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:1 journal inspect > Overall journal integrity: DAMAGED > Corrupt regions: > 0xd9a84f243c-ffffffffffffffff > [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:2 journal inspect > Overall journal integrity: OK > > > [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:0 journal reset > old journal was 879331755046~508520587 > new journal start will be 879843344384 (3068751 bytes past old end) > writing journal head > writing EResetJournal entry > done > [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:1 journal reset > old journal was 934711229813~120432327 > new journal start will be 934834864128 (3201988 bytes past old end) > writing journal head > writing EResetJournal entry > done > [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:2 journal reset > old journal was 1334153584288~252692691 > new journal start will be 1334409428992 (3152013 bytes past old end) > writing journal head > writing EResetJournal entry > done > > [root@ceph01-b /]# cephfs-table-tool all reset session > { > "0": { > "data": {}, > "result": 0 > }, > "1": { > "data": {}, > "result": 0 > }, > "2": { > "data": {}, > "result": 0 > } > } > > > [root@ceph01-b /]# cephfs-journal-tool --rank=cephfs:1 journal inspect > Overall journal integrity: OK > > > [root@ceph01-b /]# ceph fs reset cephfs --yes-i-really-mean-it > > > > But now I hit the error below: > > > -20> 2024-06-19T11:13:00.610+0000 7ff3694d0700 10 monclient: _send_mon_message to mon.cephmon-03 at v2:10.1.3.23:3300/0 > -19> 2024-06-19T11:13:00.637+0000 7ff3664ca700 2 mds.0.cache Memory usage: total 485928, rss 170860, heap 207156, baseline 182580, 0 / 33434 inodes have caps, 0 caps, 0 caps per inode > -18> 2024-06-19T11:13:00.787+0000 7ff36a4d2700 1 mds.default.cephmon-03.chjusj Updating MDS map to version 8061 from mon.1 > -17> 2024-06-19T11:13:00.787+0000 7ff36a4d2700 1 mds.0.8058 handle_mds_map i am now mds.0.8058 > -16> 2024-06-19T11:13:00.787+0000 7ff36a4d2700 1 mds.0.8058 handle_mds_map state change up:rejoin --> up:active > -15> 2024-06-19T11:13:00.787+0000 7ff36a4d2700 1 mds.0.8058 recovery_done -- successful recovery! > -14> 2024-06-19T11:13:00.788+0000 7ff36a4d2700 1 mds.0.8058 active_start > -13> 2024-06-19T11:13:00.789+0000 7ff36dcd9700 5 mds.beacon.default.cephmon-03.chjusj received beacon reply up:active seq 4 rtt 0.955007 > -12> 2024-06-19T11:13:00.790+0000 7ff36a4d2700 1 mds.0.8058 cluster recovered. > -11> 2024-06-19T11:13:00.790+0000 7ff36a4d2700 4 mds.0.8058 set_osd_epoch_barrier: epoch=33596 > -10> 2024-06-19T11:13:00.790+0000 7ff3634c4700 5 mds.0.log _submit_thread 879843344432~2609 : EUpdate check_inode_max_size [metablob 0x100, 2 dirs] > -9> 2024-06-19T11:13:00.791+0000 7ff3644c6700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/mds/MDCache.cc: In function 'void MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)' thread 7ff3644c6700 time 2024-06-19T11:13:00.791580+0000 > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/mds/MDCache.cc: 1660: FAILED ceph_assert(follows >= realm->get_newest_seq()) > > ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x135) [0x7ff374ad3e15] > 2: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7ff374ad3fdb] > 3: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)+0x13c7) [0x55da0a7aa227] > 4: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*, snapid_t)+0xc5) [0x55da0a7aa3a5] > 5: (Locker::check_inode_max_size(CInode*, bool, unsigned long, unsigned long, utime_t)+0x84d) [0x55da0a88ce3d] > 6: (RecoveryQueue::_recovered(CInode*, int, unsigned long, utime_t)+0x4f0) [0x55da0a85ad50] > 7: (MDSContext::complete(int)+0x5f) [0x55da0a9ddeef] > 8: (MDSIOContextBase::complete(int)+0x524) [0x55da0a9de674] > 9: (Filer::C_Probe::finish(int)+0xbb) [0x55da0aa9dc9b] > 10: (Context::complete(int)+0xd) [0x55da0a6775fd] > 11: (Finisher::finisher_thread_entry()+0x18d) [0x7ff374b77abd] > 12: /lib64/libpthread.so.0(+0x81ca) [0x7ff3738791ca] > 13: clone() > > -8> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client handle_log_ack log(last 7) v1 > -7> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client logged 2024-06-19T11:12:59.647346+0000 mds.default.cephmon-03.chjusj (mds.0) 1 : cluster [ERR] loaded dup inode 0x10003e45d99 [415,head] v61632 at /home/balaz/.bash_history-54696.tmp, but inode 0x10003e45d99.head v61639 already exists at /home/balaz/.bash_history > -6> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client logged 2024-06-19T11:12:59.648139+0000 mds.default.cephmon-03.chjusj (mds.0) 2 : cluster [ERR] loaded dup inode 0x10003e45d7c [415,head] v253612 at /home/rieder/.bash_history-10215.tmp, but inode 0x10003e45d7c.head v253630 already exists at /home/rieder/.bash_history > -5> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client logged 2024-06-19T11:12:59.649483+0000 mds.default.cephmon-03.chjusj (mds.0) 3 : cluster [ERR] loaded dup inode 0x10003e45d83 [415,head] v164103 at /home/gottschling/.bash_history-44802.tmp, but inode 0x10003e45d83.head v164112 already exists at /home/gottschling/.bash_history > -4> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client logged 2024-06-19T11:12:59.656221+0000 mds.default.cephmon-03.chjusj (mds.0) 4 : cluster [ERR] bad backtrace on directory inode 0x10003e42340 > -3> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client logged 2024-06-19T11:12:59.737282+0000 mds.default.cephmon-03.chjusj (mds.0) 5 : cluster [ERR] bad backtrace on directory inode 0x10003e45d8b > -2> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client logged 2024-06-19T11:12:59.804984+0000 mds.default.cephmon-03.chjusj (mds.0) 6 : cluster [ERR] bad backtrace on directory inode 0x10003e45d9f > -1> 2024-06-19T11:13:00.792+0000 7ff36a4d2700 10 log_client logged 2024-06-19T11:12:59.805078+0000 mds.default.cephmon-03.chjusj (mds.0) 7 : cluster [ERR] bad backtrace on directory inode 0x10003e45d90 > 0> 2024-06-19T11:13:00.792+0000 7ff3644c6700 -1 *** Caught signal (Aborted) ** > in thread 7ff3644c6700 thread_name:MR_Finisher > > ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable) > 1: /lib64/libpthread.so.0(+0x12d20) [0x7ff373883d20] > 2: gsignal() > 3: abort() > 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x18f) [0x7ff374ad3e6f] > 5: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7ff374ad3fdb] > 6: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)+0x13c7) [0x55da0a7aa227] > 7: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*, snapid_t)+0xc5) [0x55da0a7aa3a5] > 8: (Locker::check_inode_max_size(CInode*, bool, unsigned long, unsigned long, utime_t)+0x84d) [0x55da0a88ce3d] > 9: (RecoveryQueue::_recovered(CInode*, int, unsigned long, utime_t)+0x4f0) [0x55da0a85ad50] > 10: (MDSContext::complete(int)+0x5f) [0x55da0a9ddeef] > 11: (MDSIOContextBase::complete(int)+0x524) [0x55da0a9de674] > 12: (Filer::C_Probe::finish(int)+0xbb) [0x55da0aa9dc9b] > 13: (Context::complete(int)+0xd) [0x55da0a6775fd] > 14: (Finisher::finisher_thread_entry()+0x18d) [0x7ff374b77abd] > 15: /lib64/libpthread.so.0(+0x81ca) [0x7ff3738791ca] > 16: clone() > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > --- logging levels --- > 0/ 5 none > 0/ 1 lockdep > 0/ 1 context > 1/ 1 crush > 1/ 5 mds > 1/ 5 mds_balancer > 1/ 5 mds_locker > 1/ 5 mds_log > 1/ 5 mds_log_expire > 1/ 5 mds_migrator > 0/ 1 buffer > 0/ 1 timer > 0/ 1 filer > 0/ 1 striper > 0/ 1 objecter > 0/ 5 rados > 0/ 5 rbd > 0/ 5 rbd_mirror > 0/ 5 rbd_replay > 0/ 5 rbd_pwl > 0/ 5 journaler > 0/ 5 objectcacher > 0/ 5 immutable_obj_cache > 0/ 5 client > 1/ 5 osd > 0/ 5 optracker > 0/ 5 objclass > 1/ 3 filestore > 1/ 3 journal > 0/ 0 ms > 1/ 5 mon > 0/10 monc > 1/ 5 paxos > 0/ 5 tp > 1/ 5 auth > 1/ 5 crypto > 1/ 1 finisher > 1/ 1 reserver > 1/ 5 heartbeatmap > 1/ 5 perfcounter > 1/ 5 rgw > 1/ 5 rgw_sync > 1/ 5 rgw_datacache > 1/ 5 rgw_access > 1/ 5 rgw_dbstore > 1/ 5 rgw_flight > 1/ 5 javaclient > 1/ 5 asok > 1/ 1 throttle > 0/ 0 refs > 1/ 5 compressor > 1/ 5 bluestore > 1/ 5 bluefs > 1/ 3 bdev > 1/ 5 kstore > 4/ 5 rocksdb > 4/ 5 leveldb > 1/ 5 fuse > 2/ 5 mgr > 1/ 5 mgrc > 1/ 5 dpdk > 1/ 5 eventtrace > 1/ 5 prioritycache > 0/ 5 test > 0/ 5 cephfs_mirror > 0/ 5 cephsqlite > 0/ 5 seastore > 0/ 5 seastore_onode > 0/ 5 seastore_odata > 0/ 5 seastore_omap > 0/ 5 seastore_tm > 0/ 5 seastore_t > 0/ 5 seastore_cleaner > 0/ 5 seastore_epm > 0/ 5 seastore_lba > 0/ 5 seastore_fixedkv_tree > 0/ 5 seastore_cache > 0/ 5 seastore_journal > 0/ 5 seastore_device > 0/ 5 seastore_backref > 0/ 5 alienstore > 1/ 5 mclock > 0/ 5 cyanstore > 1/ 5 ceph_exporter > 1/ 5 memstore > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > --- pthread ID / name mapping for recent threads --- > 7ff362cc3700 / > 7ff3634c4700 / md_submit > 7ff363cc5700 / > 7ff3644c6700 / MR_Finisher > 7ff3654c8700 / PQ_Finisher > 7ff365cc9700 / mds_rank_progr > 7ff3664ca700 / ms_dispatch > 7ff3684ce700 / ceph-mds > 7ff3694d0700 / safe_timer > 7ff36a4d2700 / ms_dispatch > 7ff36b4d4700 / io_context_pool > 7ff36c4d6700 / admin_socket > 7ff36ccd7700 / msgr-worker-2 > 7ff36d4d8700 / msgr-worker-1 > 7ff36dcd9700 / msgr-worker-0 > 7ff375c9bb00 / ceph-mds > max_recent 10000 > max_new 1000 > log_file /var/log/ceph/ceph-mds.default.cephmon-03.chjusj.log > --- end dump of recent events --- > > Any idea? > > Thanks > > Dietmar > > > [...] > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- _________________________________________________________ D i e t m a r R i e d e r Innsbruck Medical University Biocenter - Institute of Bioinformatics Innrain 80, 6020 Innsbruck Phone: +43 512 9003 71402 | Mobile: +43 676 8716 72402 Email: dietmar.rieder@xxxxxxxxxxx Web: http://www.icbi.at -- _______________________________________________ D i e t m a r R i e d e r, Mag.Dr. Head of Bioinformatics Core Facility Innsbruck Medical University Biocenter - Institute of Bioinformatics Innrain 80, 6020 Innsbruck Phone: +43 512 9003 71402 Mobile: +43 676 8716 72402 Fax: +43 512 9003 74400 Email: dietmar.rieder@xxxxxxxxxxx Web: http://www.icbi.at _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx