I'm replying from my phone so hopefully this works well. This sounds suspiciously similar to an issue we have run into where there is an internal loop in the MDS that doesn't have heartbeat in it. If that loop goes for too long, it is marked as failed and the process jumps to another server and starts again. We get around it by "wedging it in a corner" and removing the ability to migrate. This is as simple as stopping all standby MDS services and just waiting for the MDS to complete. -- Paul Mezzanini Platform Engineer III Research Computing Rochester Institute of Technology Sent from my phone, please excuse typos and brevity ________________________________ From: Lars Köppel <lars.koeppel@xxxxxxxxxx> Sent: Saturday, January 6, 2024 7:22:14 AM To: Patrick Donnelly <pdonnell@xxxxxxxxxx> Cc: ceph-users@xxxxxxx <ceph-users@xxxxxxx> Subject: Re: mds crashes after up:replay state Hi Patrick, thank you for your response. I already changed the mentioned settings, but I had no luck with this. The journal inspection I had running yesterday finished with: 'Overall journal integrity: OK'. So you are probably right that the mds is crashing shortly after the replay finished. I checked the logs and there is every few seconds a new FSMap epoch without any visible changes. One of the current epochs is at the end. Is there anything useful in it? When the replay is finished the running mds goes to the state 'up:reconnect' and after a second to the state 'up:rejoin'. After this there is for ~20 min no new fsmap until this message pops up: > Jan 06 12:38:23 storage01 ceph-mds[223997]: > mds.beacon.cephfs.storage01.pgperp Skipping beacon heartbeat to monitors > (last acked 4.00012s ago); MDS internal heartbeat is not healthy! > A few seconds later (the heartbeat message is still there) a new fsmap is created with a new mds now in replay state. The last of the heartbeat messages is after 1446 seconds. Then it is gone and no more warnings or errors are displayed at this point. One minute after the last message the mds is back as standy mds. > Jan 06 13:02:26 storage01 ceph-mds[223997]: > mds.beacon.cephfs.storage01.pgperp Skipping beacon heartbeat to monitors > (last acked 1446.6s ago); MDS internal heartbeat is not healthy! > Also i can not find any warning in the logs when the mds crashes. What could I do to find the error for the crash? Best regardes Lars e205510 > enable_multiple, ever_enabled_multiple: 1,1 > default compat: compat={},rocompat={},incompat={1=base v0.20,2=client > writeable ranges,3=default file layouts on dirs,4=dir inode in separate > object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no > anchor table,9=file layout v2,10=snaprealm v2} > legacy client fscid: 3 > > Filesystem 'cephfs' (3) > fs_name cephfs > epoch 205510 > flags 32 joinable allow_snaps allow_multimds_snaps allow_standby_replay > created 2023-06-06T11:44:03.651905+0000 > modified 2024-01-06T10:28:14.676738+0000 > tableserver 0 > root 0 > session_timeout 60 > session_autoclose 300 > max_file_size 8796093022208 > required_client_features {} > last_failure 0 > last_failure_osd_epoch 42962 > compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable > ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds > uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline > data,8=no anchor table,9=file layout v2,10=snaprealm v2} > max_mds 1 > in 0 > up {0=2178448} > failed > damaged > stopped > data_pools [11,12] > metadata_pool 10 > inline_data disabled > balancer > standby_count_wanted 1 > [mds.cephfs.storage01.pgperp{0:2178448} state up:replay seq 4484 > join_fscid=3 addr [v2: > 192.168.0.101:6800/855849996,v1:192.168.0.101:6801/855849996] compat > {c=[1],r=[1],i=[7ff]}] > > > Filesystem 'cephfs_recovery' (4) > fs_name cephfs_recovery > epoch 193460 > flags 13 allow_snaps allow_multimds_snaps > created 2024-01-05T10:47:32.224388+0000 > modified 2024-01-05T16:43:37.677241+0000 > tableserver 0 > root 0 > session_timeout 60 > session_autoclose 300 > max_file_size 1099511627776 > required_client_features {} > last_failure 0 > last_failure_osd_epoch 42904 > compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable > ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds > uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline > data,8=no anchor table,9=file layout v2,10=snaprealm v2} > max_mds 1 > in 0 > up {} > failed > damaged 0 > stopped > data_pools [11,12] > metadata_pool 13 > inline_data disabled > balancer > standby_count_wanted 1 > > > Standby daemons: > > [mds.cephfs.storage02.zopcif{-1:2356728} state up:standby seq 1 > join_fscid=3 addr [v2: > 192.168.0.102:6800/3567764205,v1:192.168.0.102:6801/3567764205] compat > {c=[1],r=[1],i=[7ff]}] > dumped fsmap epoch 205510 > [image: ariadne.ai Logo] Lars Köppel Developer Email: lars.koeppel@xxxxxxxxxx Phone: +49 6221 5993580 <+4962215993580> ariadne.ai (Germany) GmbH Häusserstraße 3, 69115 Heidelberg Amtsgericht Mannheim, HRB 744040 Geschäftsführer: Dr. Fabian Svara https://ariadne.ai On Fri, Jan 5, 2024 at 7:52 PM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote: > Hi Lars, > > On Fri, Jan 5, 2024 at 9:53 AM Lars Köppel <lars.koeppel@xxxxxxxxxx> > wrote: > > > > Hello everyone, > > > > we are running a small cluster with 3 nodes and 25 osds per node. And > Ceph > > version 17.2.6. > > Recently the active mds crashed and since then the new starting mds has > > always been in the up:replay state. In the output of the command 'ceph > tell > > mds.cephfs:0 status' you can see that the journal is completely read in. > As > > soon as it's finished, the mds crashes and the next one starts reading > the > > journal. > > > > At the moment I have the journal inspection running ('cephfs-journal-tool > > --rank=cephfs:0 journal inspect'). > > > > Does anyone have any further suggestions on how I can get the cluster > > running again as quickly as possible? > > Please review: > > https://docs.ceph.com/en/reef/cephfs/troubleshooting/#stuck-during-recovery > > Note: your MDS is probably not failing in up:replay but shortly after > reaching one of the later states. Check the mon logs to see what the > FSMap changes were. > > > Patrick Donnelly, Ph.D. > He / Him / His > Red Hat Partner Engineer > IBM, Inc. > GPG: 19F28A586F808C2402351B93C3301A3E258DD79D > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx