Re: [EXTERN] Re: Urgent help with degraded filesystem needed

Patrick Donnelly <pdonnell@xxxxxxxxxx> · Mon, 24 Jun 2024 21:14:33 -0400

On Mon, Jun 24, 2024 at 5:22 PM Dietmar Rieder
<dietmar.rieder@xxxxxxxxxxx> wrote:
>
> (resending this, the original message seems that it didn't make it through between all the SPAM recently sent to the list, my apologies if it doubles at some point)
>
> Hi List,
>
> we are still struggeling to get our cephfs back online again, this is an update to inform you what we did so far, and we kindly ask for any input on this to get an idea on how to proceed:
>
> After resetting the journals Xiubo suggested (in a PM) to go on with the disaster recovery procedure:
>
> cephfs-data-scan init skipped creating the inodes 0x0x1 and 0x0x100
>
> [root@ceph01-b ~]# cephfs-data-scan init
> Inode 0x0x1 already exists, skipping create.  Use --force-init to overwrite the existing object.
> Inode 0x0x100 already exists, skipping create.  Use --force-init to overwrite the existing object.
>
> We did not use --force-init and proceeded with scan_extents using a single worker, which was indeed very slow.
>
> After ~24h we interupted the scan_extents and restarted it with 32 workers which went through in about 2h15min w/o any issue.
>
> Then I started scan_inodes with 32 workers this was also finished after ~50min no output on stderr or stdout.
>
> I went on with scan_links, which after ~45 minutes threw the following error:
>
> # cephfs-data-scan scan_links
> Error ((2) No such file or directory)

Not sure what this indicates necessarily. You can try to get more
debug information using:

[client]
  debug mds = 20
  debug ms = 1
  debug client = 20

in the local ceph.conf for the node running cephfs-data-scan.

> then "cephfs-data-scan cleanup" went through w/o any message and took about 9hrs 20min.
>
> Unfortunately, when starting the MDS the cephfs seems still to be in damage. I get quite some "loaded already corrupt dentry:" messages and 2  "[ERR] : bad backtrace on directory inode"  errors:

The "corrupt dentry" message is erroneous and fixed already (backports
in flight).

> (In the following log I removed almost all "loaded already corrupt dentry" entries, for clarity reasons)
>
> 2024-06-23T08:06:20.934+0000 7ff05728fb00  0 set uid:gid to 167:167 (ceph:ceph)
> 2024-06-23T08:06:20.934+0000 7ff05728fb00  0 ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable), process ceph-mds, pid 2
> 2024-06-23T08:06:20.934+0000 7ff05728fb00  1 main not setting numa affinity
> 2024-06-23T08:06:20.934+0000 7ff05728fb00  0 pidfile_write: ignore empty --pid-file
> 2024-06-23T08:06:20.936+0000 7ff04bac6700  1 mds.default.cephmon-01.cepqjp Updating MDS map to version 8062 from mon.0
> 2024-06-23T08:06:21.583+0000 7ff04bac6700  1 mds.default.cephmon-01.cepqjp Updating MDS map to version 8063 from mon.0
> 2024-06-23T08:06:21.583+0000 7ff04bac6700  1 mds.default.cephmon-01.cepqjp Monitors have assigned me to become a standby.
> 2024-06-23T08:06:21.604+0000 7ff04bac6700  1 mds.default.cephmon-01.cepqjp Updating MDS map to version 8064 from mon.0
> 2024-06-23T08:06:21.604+0000 7ff04bac6700  1 mds.0.8064 handle_mds_map i am now mds.0.8064
> 2024-06-23T08:06:21.604+0000 7ff04bac6700  1 mds.0.8064 handle_mds_map state change up:standby --> up:replay
> 2024-06-23T08:06:21.604+0000 7ff04bac6700  1 mds.0.8064 replay_start
> 2024-06-23T08:06:21.604+0000 7ff04bac6700  1 mds.0.8064  waiting for osdmap 34327 (which blocklists prior instance)
> 2024-06-23T08:06:21.627+0000 7ff0452b9700  0 mds.0.cache creating system inode with ino:0x100
> 2024-06-23T08:06:21.627+0000 7ff0452b9700  0 mds.0.cache creating system inode with ino:0x1
> 2024-06-23T08:06:21.636+0000 7ff0442b7700  1 mds.0.journal EResetJournal
> 2024-06-23T08:06:21.636+0000 7ff0442b7700  1 mds.0.sessionmap wipe start
> 2024-06-23T08:06:21.636+0000 7ff0442b7700  1 mds.0.sessionmap wipe result
> 2024-06-23T08:06:21.636+0000 7ff0442b7700  1 mds.0.sessionmap wipe done
> 2024-06-23T08:06:21.656+0000 7ff045aba700  1 mds.0.8064 Finished replaying journal
> 2024-06-23T08:06:21.656+0000 7ff045aba700  1 mds.0.8064 making mds journal writeable
> 2024-06-23T08:06:22.604+0000 7ff04bac6700  1 mds.default.cephmon-01.cepqjp Updating MDS map to version 8065 from mon.0
> 2024-06-23T08:06:22.604+0000 7ff04bac6700  1 mds.0.8064 handle_mds_map i am now mds.0.8064
> 2024-06-23T08:06:22.604+0000 7ff04bac6700  1 mds.0.8064 handle_mds_map state change up:replay --> up:reconnect
> 2024-06-23T08:06:22.604+0000 7ff04bac6700  1 mds.0.8064 reconnect_start
> 2024-06-23T08:06:22.604+0000 7ff04bac6700  1 mds.0.8064 reopen_log
> 2024-06-23T08:06:22.605+0000 7ff04bac6700  1 mds.0.8064 reconnect_done
> 2024-06-23T08:06:23.605+0000 7ff04bac6700  1 mds.default.cephmon-01.cepqjp Updating MDS map to version 8066 from mon.0
> 2024-06-23T08:06:23.605+0000 7ff04bac6700  1 mds.0.8064 handle_mds_map i am now mds.0.8064
> 2024-06-23T08:06:23.605+0000 7ff04bac6700  1 mds.0.8064 handle_mds_map state change up:reconnect --> up:rejoin
> 2024-06-23T08:06:23.605+0000 7ff04bac6700  1 mds.0.8064 rejoin_start
> 2024-06-23T08:06:23.609+0000 7ff04bac6700  1 mds.0.8064 rejoin_joint_start
> 2024-06-23T08:06:23.611+0000 7ff045aba700  1 mds.0.cache.den(0x10000000000 groups) loaded already corrupt dentry: [dentry #0x1/data/groups [bf,head] rep@0.0 NULL (dversion lock) pv=0 v=
> 7910497 ino=(nil) state=0 0x55cf13e9f400]
> 2024-06-23T08:06:23.611+0000 7ff045aba700  1 mds.0.cache.den(0x1000192ec16.1* scad_prj) loaded already corrupt dentry: [dentry #0x1/home/scad_prj [159,head] rep@0.0 NULL (dversion lock)
>  pv=0 v=2462060 ino=(nil) state=0 0x55cf14220a00]
> [...]
> 2024-06-23T08:06:23.628+0000 7ff045aba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10003e42340
> 2024-06-23T08:06:23.668+0000 7ff045aba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10003e45d8b
> [...]
> 2024-06-23T08:06:23.773+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp respawn!
> --- begin dump of recent events ---
>  -9999> 2024-06-23T08:06:23.615+0000 7ff045aba700  1 mds.0.cache.den(0x1000321976d jupyterhub_slurmspawner_67002.log) loaded already corrupt dentry: [dentry #0x1/home/michelotto/jupyterhub_slurmspawner_67002.log [239,36c] rep@0.0 NULL (dversion lock) pv=0 v=26855917 ino=(nil) state=0 0x55cf147ff680]
> [...]
>  -9878> 2024-06-23T08:06:23.616+0000 7ff04eacc700 10 monclient: get_auth_request con 0x55cf13f13c00 auth_method 0
> [...]
>  -9877> 2024-06-23T08:06:23.616+0000 7ff04e2cb700 10 monclient: get_auth_request con 0x55cf146d1800 auth_method 0
>  -9458> 2024-06-23T08:06:23.620+0000 7ff04f2cd700 10 monclient: get_auth_request con 0x55cf13f13400 auth_method 0
>  -9353> 2024-06-23T08:06:23.622+0000 7ff04eacc700 10 monclient: get_auth_request con 0x55cf146d0800 auth_method 0
>  -8980> 2024-06-23T08:06:23.625+0000 7ff04e2cb700 10 monclient: get_auth_request con 0x55cf14d9f400 auth_method 0
>  -8978> 2024-06-23T08:06:23.625+0000 7ff04f2cd700 10 monclient: get_auth_request con 0x55cf14d9fc00 auth_method 0
>  -8849> 2024-06-23T08:06:23.628+0000 7ff045aba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10003e42340
>  -8574> 2024-06-23T08:06:23.633+0000 7ff04e2cb700 10 monclient: get_auth_request con 0x55cf14323400 auth_method 0
>  -8570> 2024-06-23T08:06:23.633+0000 7ff04e2cb700 10 monclient: get_auth_request con 0x55cf1485e400 auth_method 0
>  -8564> 2024-06-23T08:06:23.633+0000 7ff04eacc700 10 monclient: get_auth_request con 0x55cf14322c00 auth_method 0
>  -8561> 2024-06-23T08:06:23.633+0000 7ff04f2cd700 10 monclient: get_auth_request con 0x55cf13f12800 auth_method 0
>  -8555> 2024-06-23T08:06:23.633+0000 7ff04eacc700 10 monclient: get_auth_request con 0x55cf14f48000 auth_method 0
>  -8546> 2024-06-23T08:06:23.633+0000 7ff04f2cd700 10 monclient: get_auth_request con 0x55cf1485f800 auth_method 0
>  -8541> 2024-06-23T08:06:23.633+0000 7ff04eacc700 10 monclient: get_auth_request con 0x55cf1516b000 auth_method 0
>  -8470> 2024-06-23T08:06:23.634+0000 7ff04e2cb700 10 monclient: get_auth_request con 0x55cf14322400 auth_method 0
>  -8451> 2024-06-23T08:06:23.635+0000 7ff04f2cd700 10 monclient: get_auth_request con 0x55cf13f13800 auth_method 0
>  -8445> 2024-06-23T08:06:23.635+0000 7ff04eacc700 10 monclient: get_auth_request con 0x55cf1485e800 auth_method 0
>  -8243> 2024-06-23T08:06:23.637+0000 7ff04e2cb700 10 monclient: get_auth_request con 0x55cf141ce400 auth_method 0
>  -7381> 2024-06-23T08:06:23.645+0000 7ff04f2cd700 10 monclient: get_auth_request con 0x55cf1485f400 auth_method 0
>  -6319> 2024-06-23T08:06:23.660+0000 7ff04eacc700 10 monclient: get_auth_request con 0x55cf14f48400 auth_method 0
>  -5946> 2024-06-23T08:06:23.666+0000 7ff04e2cb700 10 monclient: get_auth_request con 0x55cf146d1400 auth_method 0
> [...]
>  -5677> 2024-06-23T08:06:23.668+0000 7ff045aba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10003e45d8b
> [...]
>     -8> 2024-06-23T08:06:23.753+0000 7ff045aba700  5 mds.beacon.default.cephmon-01.cepqjp set_want_state: up:rejoin -> down:damaged
>     -7> 2024-06-23T08:06:23.753+0000 7ff045aba700 10 log_client log_queue is 2 last_log 2 sent 0 num 2 unsent 2 sending 2
>     -6> 2024-06-23T08:06:23.753+0000 7ff045aba700 10 log_client  will send 2024-06-23T08:06:23.629743+0000 mds.default.cephmon-01.cepqjp (mds.0) 1 : cluster [ERR] bad backtrace on directory inode 0x10003e42340
>     -5> 2024-06-23T08:06:23.753+0000 7ff045aba700 10 log_client  will send 2024-06-23T08:06:23.669673+0000 mds.default.cephmon-01.cepqjp (mds.0) 2 : cluster [ERR] bad backtrace on directory inode 0x10003e45d8b
>     -4> 2024-06-23T08:06:23.753+0000 7ff045aba700 10 monclient: _send_mon_message to mon.cephmon-01 at v2:10.1.3.21:3300/0
>     -3> 2024-06-23T08:06:23.753+0000 7ff045aba700  5 mds.beacon.default.cephmon-01.cepqjp Sending beacon down:damaged seq 4
>     -2> 2024-06-23T08:06:23.753+0000 7ff045aba700 10 monclient: _send_mon_message to mon.cephmon-01 at v2:10.1.3.21:3300/0
>     -1> 2024-06-23T08:06:23.773+0000 7ff04e2cb700  5 mds.beacon.default.cephmon-01.cepqjp received beacon reply down:damaged seq 4 rtt 0.0200001
>      0> 2024-06-23T08:06:23.773+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp respawn!
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 rbd_mirror
>    0/ 5 rbd_replay
>    0/ 5 rbd_pwl
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 immutable_obj_cache
>    0/ 5 client
>    1/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 journal
>    0/ 0 ms
>    1/ 5 mon
>    0/10 monc
>    1/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 1 reserver
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/ 5 rgw_sync
>    1/ 5 rgw_datacache
>    1/ 5 rgw_access
>    1/ 5 rgw_dbstore
>    1/ 5 rgw_flight
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>    0/ 0 refs
>    1/ 5 compressor
>    1/ 5 bluestore
>    1/ 5 bluefs
>    1/ 3 bdev
>    1/ 5 kstore
>    4/ 5 rocksdb
>    4/ 5 leveldb
>    1/ 5 fuse
>    2/ 5 mgr
>    1/ 5 mgrc
>    1/ 5 dpdk
>    1/ 5 eventtrace
>    1/ 5 prioritycache
>    0/ 5 test
>    0/ 5 cephfs_mirror
>    0/ 5 cephsqlite
>    0/ 5 seastore
>    0/ 5 seastore_onode
>    0/ 5 seastore_odata
>    0/ 5 seastore_omap
>    0/ 5 seastore_tm
>    0/ 5 seastore_t
>    0/ 5 seastore_cleaner
>    0/ 5 seastore_epm
>    0/ 5 seastore_lba
>    0/ 5 seastore_fixedkv_tree
>    0/ 5 seastore_cache
>    0/ 5 seastore_journal
>    0/ 5 seastore_device
>    0/ 5 seastore_backref
>    0/ 5 alienstore
>    1/ 5 mclock
>    0/ 5 cyanstore
>    1/ 5 ceph_exporter
>    1/ 5 memstore
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
> --- pthread ID / name mapping for recent threads ---
>   7ff045aba700 / MR_Finisher
>   7ff04e2cb700 / msgr-worker-2
>   7ff04eacc700 / msgr-worker-1
>   7ff04f2cd700 / msgr-worker-0
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-mds.default.cephmon-01.cepqjp.log
> --- end dump of recent events ---
> 2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  e: '/usr/bin/ceph-mds'
> 2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  0: '/usr/bin/ceph-mds'
> 2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  1: '-n'
> 2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  2: 'mds.default.cephmon-01.cepqjp'
> 2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  3: '-f'
> 2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  4: '--setuser'
> 2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  5: 'ceph'
> 2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  6: '--setgroup'
> 2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  7: 'ceph'
> 2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  8: '--default-log-to-file=false'
> 2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  9: '--default-log-to-journald=true'
> 2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  10: '--default-log-to-stderr=false'
> 2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp respawning with exe /usr/bin/ceph-mds
> 2024-06-23T08:06:23.786+0000 7ff045aba700  1 mds.default.cephmon-01.cepqjp  exe_path /proc/self/exe
> 2024-06-23T08:06:23.812+0000 7fe58d619b00  0 ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable), process ceph-mds, pid 2
> 2024-06-23T08:06:23.812+0000 7fe58d619b00  1 main not setting numa affinity
> 2024-06-23T08:06:23.813+0000 7fe58d619b00  0 pidfile_write: ignore empty --pid-file
> 2024-06-23T08:06:23.814+0000 7fe58226e700  1 mds.default.cephmon-01.cepqjp Updating MDS map to version 8067 from mon.0
> 2024-06-23T08:06:24.772+0000 7fe58226e700  1 mds.default.cephmon-01.cepqjp Updating MDS map to version 8068 from mon.0
> 2024-06-23T08:06:24.772+0000 7fe58226e700  1 mds.default.cephmon-01.cepqjp Monitors have assigned me to become a standby.
> 2024-06-23T08:49:28.778+0000 7fe584272700  1 mds.default.cephmon-01.cepqjp asok_command: heap {heapcmd=stats,prefix=heap} (starting...)
> 2024-06-23T22:00:04.664+0000 7fe583a71700 -1 received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
>
>
> Any ideas how to proceed?

Whatever you snipped from the log has the real error. The MDS tries to
recover from both "ERR" messages you are concerned about. It should
not go damaged.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx