Hi Paul, your suggestion was correct. The mds went through the replay state and was a few minutes in the active state. But then it gets killed because of too high memory consumption. > @mds.cephfs.storage01.pgperp.service: Main process exited, code=exited, > status=137/n/a How could I raise the memory limit for the mds? >From the looks in htop. It looked like there is a memory leak, because it consumed over 200 GB of memory while reporting that it actually used 20 - 30 GB. Is this possible? Best regardes Lars [image: ariadne.ai Logo] Lars Köppel Developer Email: lars.koeppel@xxxxxxxxxx Phone: +49 6221 5993580 <+4962215993580> ariadne.ai (Germany) GmbH Häusserstraße 3, 69115 Heidelberg Amtsgericht Mannheim, HRB 744040 Geschäftsführer: Dr. Fabian Svara https://ariadne.ai On Sat, Jan 6, 2024 at 3:33 PM Paul Mezzanini <pfmeec@xxxxxxx> wrote: > I'm replying from my phone so hopefully this works well. This sounds > suspiciously similar to an issue we have run into where there is an > internal loop in the MDS that doesn't have heartbeat in it. If that loop > goes for too long, it is marked as failed and the process jumps to another > server and starts again. > > We get around it by "wedging it in a corner" and removing the ability to > migrate. This is as simple as stopping all standby MDS services and just > waiting for the MDS to complete. > > > > -- > > Paul Mezzanini > Platform Engineer III > Research Computing > > Rochester Institute of Technology > > Sent from my phone, please excuse typos and brevity > ------------------------------ > *From:* Lars Köppel <lars.koeppel@xxxxxxxxxx> > *Sent:* Saturday, January 6, 2024 7:22:14 AM > *To:* Patrick Donnelly <pdonnell@xxxxxxxxxx> > *Cc:* ceph-users@xxxxxxx <ceph-users@xxxxxxx> > *Subject:* Re: mds crashes after up:replay state > > Hi Patrick, > > thank you for your response. > I already changed the mentioned settings, but I had no luck with this. > > The journal inspection I had running yesterday finished with: 'Overall > journal integrity: OK'. > So you are probably right that the mds is crashing shortly after the replay > finished. > > I checked the logs and there is every few seconds a new FSMap epoch without > any visible changes. One of the current epochs is at the end. Is there > anything useful in it? > > When the replay is finished the running mds goes to the state > 'up:reconnect' and after a second to the state 'up:rejoin'. After this > there is for ~20 min no new fsmap until this message pops up: > > > Jan 06 12:38:23 storage01 ceph-mds[223997]: > > mds.beacon.cephfs.storage01.pgperp Skipping beacon heartbeat to monitors > > (last acked 4.00012s ago); MDS internal heartbeat is not healthy! > > > A few seconds later (the heartbeat message is still there) a new fsmap is > created with a new mds now in replay state. > The last of the heartbeat messages is after 1446 seconds. Then it is gone > and no more warnings or errors are displayed at this point. One minute > after the last message the mds is back as standy mds. > > > Jan 06 13:02:26 storage01 ceph-mds[223997]: > > mds.beacon.cephfs.storage01.pgperp Skipping beacon heartbeat to monitors > > (last acked 1446.6s ago); MDS internal heartbeat is not healthy! > > > > Also i can not find any warning in the logs when the mds crashes. What > could I do to find the error for the crash? > > Best regardes > Lars > > e205510 > > enable_multiple, ever_enabled_multiple: 1,1 > > default compat: compat={},rocompat={},incompat={1=base v0.20,2=client > > writeable ranges,3=default file layouts on dirs,4=dir inode in separate > > object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no > > anchor table,9=file layout v2,10=snaprealm v2} > > legacy client fscid: 3 > > > > Filesystem 'cephfs' (3) > > fs_name cephfs > > epoch 205510 > > flags 32 joinable allow_snaps allow_multimds_snaps allow_standby_replay > > created 2023-06-06T11:44:03.651905+0000 > > modified 2024-01-06T10:28:14.676738+0000 > > tableserver 0 > > root 0 > > session_timeout 60 > > session_autoclose 300 > > max_file_size 8796093022208 > > required_client_features {} > > last_failure 0 > > last_failure_osd_epoch 42962 > > compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable > > ranges,3=default file layouts on dirs,4=dir inode in separate > object,5=mds > > uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline > > data,8=no anchor table,9=file layout v2,10=snaprealm v2} > > max_mds 1 > > in 0 > > up {0=2178448} > > failed > > damaged > > stopped > > data_pools [11,12] > > metadata_pool 10 > > inline_data disabled > > balancer > > standby_count_wanted 1 > > [mds.cephfs.storage01.pgperp{0:2178448} state up:replay seq 4484 > > join_fscid=3 addr [v2: > > 192.168.0.101:6800/855849996,v1:192.168.0.101:6801/855849996] compat > > {c=[1],r=[1],i=[7ff]}] > > > > > > Filesystem 'cephfs_recovery' (4) > > fs_name cephfs_recovery > > epoch 193460 > > flags 13 allow_snaps allow_multimds_snaps > > created 2024-01-05T10:47:32.224388+0000 > > modified 2024-01-05T16:43:37.677241+0000 > > tableserver 0 > > root 0 > > session_timeout 60 > > session_autoclose 300 > > max_file_size 1099511627776 > > required_client_features {} > > last_failure 0 > > last_failure_osd_epoch 42904 > > compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable > > ranges,3=default file layouts on dirs,4=dir inode in separate > object,5=mds > > uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline > > data,8=no anchor table,9=file layout v2,10=snaprealm v2} > > max_mds 1 > > in 0 > > up {} > > failed > > damaged 0 > > stopped > > data_pools [11,12] > > metadata_pool 13 > > inline_data disabled > > balancer > > standby_count_wanted 1 > > > > > > Standby daemons: > > > > [mds.cephfs.storage02.zopcif{-1:2356728} state up:standby seq 1 > > join_fscid=3 addr [v2: > > 192.168.0.102:6800/3567764205,v1:192.168.0.102:6801/3567764205] compat > > {c=[1],r=[1],i=[7ff]}] > > dumped fsmap epoch 205510 > > > > > [image: ariadne.ai Logo] Lars Köppel > Developer > Email: lars.koeppel@xxxxxxxxxx > Phone: +49 6221 5993580 <+4962215993580> > ariadne.ai (Germany) GmbH > Häusserstraße 3, 69115 Heidelberg > Amtsgericht Mannheim, HRB 744040 > Geschäftsführer: Dr. Fabian Svara > https://ariadne.ai > > > On Fri, Jan 5, 2024 at 7:52 PM Patrick Donnelly <pdonnell@xxxxxxxxxx> > wrote: > > > Hi Lars, > > > > On Fri, Jan 5, 2024 at 9:53 AM Lars Köppel <lars.koeppel@xxxxxxxxxx> > > wrote: > > > > > > Hello everyone, > > > > > > we are running a small cluster with 3 nodes and 25 osds per node. And > > Ceph > > > version 17.2.6. > > > Recently the active mds crashed and since then the new starting mds has > > > always been in the up:replay state. In the output of the command 'ceph > > tell > > > mds.cephfs:0 status' you can see that the journal is completely read > in. > > As > > > soon as it's finished, the mds crashes and the next one starts reading > > the > > > journal. > > > > > > At the moment I have the journal inspection running > ('cephfs-journal-tool > > > --rank=cephfs:0 journal inspect'). > > > > > > Does anyone have any further suggestions on how I can get the cluster > > > running again as quickly as possible? > > > > Please review: > > > > > https://docs.ceph.com/en/reef/cephfs/troubleshooting/#stuck-during-recovery > > > > Note: your MDS is probably not failing in up:replay but shortly after > > reaching one of the later states. Check the mon logs to see what the > > FSMap changes were. > > > > > > Patrick Donnelly, Ph.D. > > He / Him / His > > Red Hat Partner Engineer > > IBM, Inc. > > GPG: 19F28A586F808C2402351B93C3301A3E258DD79D > > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx