Re: Unable to start MDS and access CephFS after upgrade to 17.2.6

Henning Achterrath <achhen@xxxxxxxxxxx> · Thu, 15 Jun 2023 16:53:28 +0200

Hello again,

we were able to start our MDS-server again. We performed the following 
steps:

ceph fs fail fdi-cephfs

cephfs-journal-tool --rank=cephfs:0 journal export backup.bin

cephfs-journal-tool --rank=cephfs:0 journal inspect

cephfs-journal-tool --rank=cephfs:all event get list

cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary

cephfs-journal-tool --rank=cephfs:0 journal reset

After that, we were able to start the MDS-Server again, by setting "ceph 
fs set fdi-cephfs max_mds 1"

Now were are running a online scrub:

ceph tell mds.0 scrub start /projects recursive,repair,force

In the documentation [1] there are further steps mentioned after journal 
reset (like cephfs-table-tool all reset session, ceph fs reset <fs name> 
--yes-i-really-mean-it, cephfs-journal-tool, cephfs-data-scan ... and so 
forth)

Is it necessary to perform all these steps? Or is it sufficient to do 
online scrubbing instead?

Best

[1] https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/

On 14.06.23 13:51, Ben Stöver wrote:
Hi everyone,

as discussed on this list before, we had an issue upgrading the metadata 
servers while performing an upgrade from 17.2.5 to 17.2.6. (See also 
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/U3VEPCXYDYO2YSGF76CJLU25YOPEB3XU/#EVEP2MEMEI5HAXLYAXMHMWM6ZLJ2KUR6 .) We had to pause the upgrade leaving us in a state where the MDS were running on the old version, while most other cluster components were already running the newer version. In the meantime we were able solve this issue and also reduced the number of active MDS from 2 to 1.

Today we resumed the upgrade and all components are on version 17.2.6 
now. Right after the upgrade a file system inconsistency (backtrace) was 
found, which we solved by scrubbing. Shortly after users coming back on 
the cluster using the CephFS again, all MDS failed and cannot be started 
again. The first MDS gets stuck in state "replay (laggy)", while both 
standby MDS crash right away ("Caught signal (Segmentation fault)").

All MDS report multiple of the following problems in the log:
- "replayed ESubtreeMap at xxx subtree root 0x1 not in cache"
- "replayed ESubtreeMap at xxx subtree root 0x1 is not mine in cache 
(it's -2,-2)"

As a result, we are currently not able to access our CephFS anymore. 
Does anyone have an idea how to solve this or what could be the cause? 
(I have found this ticket, which seems to be very similar but there does 
not seem to be a solution mentioned there. 
https://bugzilla.redhat.com/show_bug.cgi?id=2056935 )

Below are some parts of the MDS logs that seem relevant to us for this 
issue.

We are thankful for any ideas. :-)

Best
Ben

Log excerpt of Active MDS (replay):

   -140> 2023-06-14T07:51:59.585+0000 7feb588c0700  5 
asok(0x55e56e23a000) register_command objecter_requests hook 0x55e56e1ca380
   -139> 2023-06-14T07:51:59.585+0000 7feb588c0700 10 monclient: 
_renew_subs
   -138> 2023-06-14T07:51:59.585+0000 7feb588c0700 10 monclient: 
_send_mon_message to mon.ceph-service-01 at v2:131.220.126.65:3300/0
   -137> 2023-06-14T07:51:59.585+0000 7feb588c0700 10 
log_channel(cluster) update_config to_monitors: true to_syslog: false 
syslog_facility:  prio: info to_graylog: false graylog_host: 127.0.0.1 
graylog_port: 12201)
   -136> 2023-06-14T07:51:59.585+0000 7feb588c0700  4 mds.0.purge_queue 
operator():  data pool 3 not found in OSDMap
   -135> 2023-06-14T07:51:59.585+0000 7feb588c0700  4 mds.0.0 
apply_blocklist: killed 0, blocklisted sessions (0 blocklist entries, 0)
   -134> 2023-06-14T07:51:59.585+0000 7feb588c0700  1 mds.0.172004 
handle_mds_map i am now mds.0.172004
   -133> 2023-06-14T07:51:59.585+0000 7feb588c0700  1 mds.0.172004 
handle_mds_map state change up:standby --> up:replay
   -132> 2023-06-14T07:51:59.585+0000 7feb588c0700  5 
mds.beacon.fdi-cephfs.ceph-service-01.gfwudy set_want_state: up:standby 
-> up:replay
   -131> 2023-06-14T07:51:59.585+0000 7feb588c0700  1 mds.0.172004 
replay_start
   -130> 2023-06-14T07:51:59.585+0000 7feb588c0700  1 mds.0.172004 
waiting for osdmap 341143 (which blocklists prior instance)
   -129> 2023-06-14T07:51:59.585+0000 7feb588c0700 10 monclient: 
_send_mon_message to mon.ceph-service-01 at v2:131.220.126.65:3300/0
   -128> 2023-06-14T07:51:59.585+0000 7feb548b8700  2 mds.0.cache Memory 
usage:  total 300004, rss 34196, heap 182556, baseline 182556, 0 / 0 
inodes have caps, 0 caps, 0 caps per inode
   -127> 2023-06-14T07:51:59.603+0000 7feb588c0700 10 monclient: 
_renew_subs
   -126> 2023-06-14T07:51:59.603+0000 7feb588c0700 10 monclient: 
_send_mon_message to mon.ceph-service-01 at v2:131.220.126.65:3300/0
   -125> 2023-06-14T07:51:59.603+0000 7feb588c0700 10 monclient: 
handle_get_version_reply finishing 1 version 341143
   -124> 2023-06-14T07:51:59.603+0000 7feb528b4700  2 mds.0.172004 
Booting: 0: opening inotable
   -123> 2023-06-14T07:51:59.603+0000 7feb528b4700  2 mds.0.172004 
Booting: 0: opening sessionmap
   -122> 2023-06-14T07:51:59.603+0000 7feb528b4700  2 mds.0.172004 
Booting: 0: opening mds log
   -121> 2023-06-14T07:51:59.603+0000 7feb528b4700  5 mds.0.log open 
discovering log bounds
   -120> 2023-06-14T07:51:59.603+0000 7feb528b4700  2 mds.0.172004 
Booting: 0: opening purge queue (async)
   -119> 2023-06-14T07:51:59.603+0000 7feb528b4700  4 mds.0.purge_queue 
open: opening
   -118> 2023-06-14T07:51:59.603+0000 7feb528b4700  1 
mds.0.journaler.pq(ro) recover start
   -117> 2023-06-14T07:51:59.603+0000 7feb528b4700  1 
mds.0.journaler.pq(ro) read_head
   -116> 2023-06-14T07:51:59.603+0000 7feb520b3700  4 
mds.0.journalpointer Reading journal pointer '400.00000000'
   -115> 2023-06-14T07:51:59.603+0000 7feb528b4700  2 mds.0.172004 
Booting: 0: loading open file table (async)
   -114> 2023-06-14T07:51:59.604+0000 7feb528b4700  2 mds.0.172004 
Booting: 0: opening snap table
   -113> 2023-06-14T07:51:59.604+0000 7feb5c0c7700 10 monclient: 
get_auth_request con 0x55e56ef7e000 auth_method 0
   -112> 2023-06-14T07:51:59.604+0000 7feb5b8c6700 10 monclient: 
get_auth_request con 0x55e56ef7e800 auth_method 0
   -111> 2023-06-14T07:51:59.604+0000 7feb5b0c5700 10 monclient: 
get_auth_request con 0x55e56ef7f000 auth_method 0
   -110> 2023-06-14T07:51:59.604+0000 7feb5c0c7700 10 monclient: 
get_auth_request con 0x55e56ef92000 auth_method 0
   -109> 2023-06-14T07:51:59.605+0000 7feb538b6700  1 
mds.0.journaler.pq(ro) _finish_read_head loghead(trim 18366857216, 
expire 18367220777, write 18367220777, stream_format 1).  probing for 
end of log (from 18367220777)...
   -108> 2023-06-14T07:51:59.605+0000 7feb538b6700  1 
mds.0.journaler.pq(ro) probing for end of the log
   -107> 2023-06-14T07:51:59.605+0000 7feb520b3700  1 
mds.0.journaler.mdlog(ro) recover start
   -106> 2023-06-14T07:51:59.605+0000 7feb520b3700  1 
mds.0.journaler.mdlog(ro) read_head
   -105> 2023-06-14T07:51:59.605+0000 7feb520b3700  4 mds.0.log Waiting 
for journal 0x200 to recover...
   -104> 2023-06-14T07:51:59.605+0000 7feb5b8c6700 10 monclient: 
get_auth_request con 0x55e56ef7f400 auth_method 0
   -103> 2023-06-14T07:51:59.605+0000 7feb5c0c7700 10 monclient: 
get_auth_request con 0x55e56ef92800 auth_method 0
   -102> 2023-06-14T07:51:59.605+0000 7feb5b0c5700 10 monclient: 
get_auth_request con 0x55e56ef7f800 auth_method 0
   -101> 2023-06-14T07:51:59.606+0000 7feb528b4700  1 
mds.0.journaler.mdlog(ro) _finish_read_head loghead(trim 
320109193199616, expire 320109196838948, write 320109612836052, 
stream_format 1).  probing for end of log (from 320109612836052)...
   -100> 2023-06-14T07:51:59.606+0000 7feb528b4700  1 
mds.0.journaler.mdlog(ro) probing for end of the log
    -99> 2023-06-14T07:51:59.606+0000 7feb5b0c5700 10 monclient: 
get_auth_request con 0x55e56f2ea800 auth_method 0
    -98> 2023-06-14T07:51:59.606+0000 7feb5b8c6700 10 monclient: 
get_auth_request con 0x55e56f2ea000 auth_method 0
    -97> 2023-06-14T07:51:59.606+0000 7feb538b6700  1 
mds.0.journaler.pq(ro) _finish_probe_end write_pos = 18367220777 (header 
had 18367220777). recovered.
    -96> 2023-06-14T07:51:59.606+0000 7feb538b6700  4 mds.0.purge_queue 
operator(): open complete
    -95> 2023-06-14T07:51:59.606+0000 7feb538b6700  1 
mds.0.journaler.pq(ro) set_writeable
    -94> 2023-06-14T07:51:59.607+0000 7feb528b4700  1 
mds.0.journaler.mdlog(ro) _finish_probe_end write_pos = 320109613139120 
(header had 320109612836052). recovered.
    -93> 2023-06-14T07:51:59.607+0000 7feb520b3700  4 mds.0.log Journal 
0x200 recovered.
    -92> 2023-06-14T07:51:59.607+0000 7feb520b3700  4 mds.0.log 
Recovered journal 0x200 in format 1
    -91> 2023-06-14T07:51:59.607+0000 7feb520b3700  2 mds.0.172004 
Booting: 1: loading/discovering base inodes
    -90> 2023-06-14T07:51:59.607+0000 7feb520b3700  0 mds.0.cache 
creating system inode with ino:0x100
    -89> 2023-06-14T07:51:59.607+0000 7feb520b3700  0 mds.0.cache 
creating system inode with ino:0x1
    -88> 2023-06-14T07:51:59.608+0000 7feb5c0c7700 10 monclient: 
get_auth_request con 0x55e56f2eb400 auth_method 0
    -87> 2023-06-14T07:51:59.615+0000 7feb528b4700  2 mds.0.172004 
Booting: 2: replaying mds log
    -86> 2023-06-14T07:51:59.615+0000 7feb528b4700  2 mds.0.172004 
Booting: 2: waiting for purge queue recovered
    -85> 2023-06-14T07:51:59.615+0000 7feb5b8c6700 10 monclient: 
get_auth_request con 0x55e56f1cec00 auth_method 0
    -84> 2023-06-14T07:51:59.615+0000 7feb5b0c5700 10 monclient: 
get_auth_request con 0x55e56f2eb800 auth_method 0
    -83> 2023-06-14T07:51:59.615+0000 7feb5c0c7700 10 monclient: 
get_auth_request con 0x55e56f1ce400 auth_method 0
    -82> 2023-06-14T07:51:59.615+0000 7feb5b8c6700 10 monclient: 
get_auth_request con 0x55e56ef7e400 auth_method 0
    -81> 2023-06-14T07:51:59.615+0000 7feb5b0c5700 10 monclient: 
get_auth_request con 0x55e56f1cf400 auth_method 0
    -80> 2023-06-14T07:51:59.634+0000 7feb510b1700 -1 
log_channel(cluster) log [ERR] :  replayed ESubtreeMap at 
320109197452549 subtree root 0x1 not in cache
    -79> 2023-06-14T07:51:59.634+0000 7feb510b1700  0 mds.0.journal 
journal subtrees: {0x1=[],0x100=[]}
    -78> 2023-06-14T07:51:59.634+0000 7feb510b1700  0 mds.0.journal 
journal ambig_subtrees:
    -77> 2023-06-14T07:51:59.637+0000 7feb510b1700 -1 
log_channel(cluster) log [ERR] :  replayed ESubtreeMap at 
320109198067908 subtree root 0x1 not in cache
    -76> 2023-06-14T07:51:59.637+0000 7feb510b1700  0 mds.0.journal 
journal subtrees: {0x1=[],0x100=[]}
    -75> 2023-06-14T07:51:59.637+0000 7feb510b1700  0 mds.0.journal 
journal ambig_subtrees:
    -74> 2023-06-14T07:51:59.640+0000 7feb510b1700 -1 
log_channel(cluster) log [ERR] :  replayed ESubtreeMap at 
320109198682422 subtree root 0x1 not in cache
    -73> 2023-06-14T07:51:59.640+0000 7feb510b1700  0 mds.0.journal 
journal subtrees: {0x1=[],0x100=[]}

Log excerpt of standby MDS:

    -23> 2023-06-14T07:48:56.590+0000 7f14fcb97700 -1 
log_channel(cluster) log [ERR] :  replayed ESubtreeMap at 
320109205645870 subtree root 0x1 is not mine in cache (it's -2,-2)
    -22> 2023-06-14T07:48:56.590+0000 7f14fcb97700  0 mds.0.journal 
journal subtrees: {0x1=[],0x100=[]}
    -21> 2023-06-14T07:48:56.590+0000 7f14fcb97700  0 mds.0.journal 
journal ambig_subtrees:
    -20> 2023-06-14T07:48:56.592+0000 7f14fcb97700 -1 
log_channel(cluster) log [ERR] :  replayed ESubtreeMap at 
320109206260384 subtree root 0x1 is not mine in cache (it's -2,-2)
    -19> 2023-06-14T07:48:56.592+0000 7f14fcb97700  0 mds.0.journal 
journal subtrees: {0x1=[],0x100=[]}
    -18> 2023-06-14T07:48:56.592+0000 7f14fcb97700  0 mds.0.journal 
journal ambig_subtrees:
    -17> 2023-06-14T07:48:56.595+0000 7f14fcb97700 -1 
log_channel(cluster) log [ERR] :  replayed ESubtreeMap at 
320109206875743 subtree root 0x1 is not mine in cache (it's -2,-2)
    -16> 2023-06-14T07:48:56.595+0000 7f14fcb97700  0 mds.0.journal 
journal subtrees: {0x1=[],0x100=[]}
    -15> 2023-06-14T07:48:56.595+0000 7f14fcb97700  0 mds.0.journal 
journal ambig_subtrees:
    -14> 2023-06-14T07:48:56.598+0000 7f14fcb97700 -1 
log_channel(cluster) log [ERR] :  replayed ESubtreeMap at 
320109207491102 subtree root 0x1 is not mine in cache (it's -2,-2)
    -13> 2023-06-14T07:48:56.598+0000 7f14fcb97700  0 mds.0.journal 
journal subtrees: {0x1=[],0x100=[]}
    -12> 2023-06-14T07:48:56.598+0000 7f14fcb97700  0 mds.0.journal 
journal ambig_subtrees:
    -11> 2023-06-14T07:48:56.601+0000 7f14fcb97700 -1 
log_channel(cluster) log [ERR] :  replayed ESubtreeMap at 
320109208105616 subtree root 0x1 is not mine in cache (it's -2,-2)
    -10> 2023-06-14T07:48:56.601+0000 7f14fcb97700  0 mds.0.journal 
journal subtrees: {0x1=[],0x100=[]}
     -9> 2023-06-14T07:48:56.601+0000 7f14fcb97700  0 mds.0.journal 
journal ambig_subtrees:
     -8> 2023-06-14T07:48:56.638+0000 7f1507bad700 10 monclient: 
get_auth_request con 0x556935cb3400 auth_method 0
     -7> 2023-06-14T07:48:56.677+0000 7f15073ac700 10 monclient: 
get_auth_request con 0x556935c92c00 auth_method 0
     -6> 2023-06-14T07:48:56.687+0000 7f1506bab700 10 monclient: 
get_auth_request con 0x5569359f6400 auth_method 0
     -5> 2023-06-14T07:48:56.967+0000 7f1507bad700 10 monclient: 
get_auth_request con 0x556935cb2800 auth_method 0
     -4> 2023-06-14T07:48:57.008+0000 7f15073ac700 10 monclient: 
get_auth_request con 0x556935cb8400 auth_method 0
     -3> 2023-06-14T07:48:57.159+0000 7f1506bab700 10 monclient: 
get_auth_request con 0x556935cb3800 auth_method 0
     -2> 2023-06-14T07:48:57.169+0000 7f1507bad700 10 monclient: 
get_auth_request con 0x556935cb9000 auth_method 0
     -1> 2023-06-14T07:48:57.197+0000 7f15073ac700 10 monclient: 
get_auth_request con 0x5569359f7400 auth_method 0
      0> 2023-06-14T07:48:57.317+0000 7f14fcb97700 -1 *** Caught signal 
(Segmentation fault) **
  in thread 7f14fcb97700 thread_name:md_log_replay
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx