Re: mds crashes after up:replay state

Lars Köppel <lars.koeppel@xxxxxxxxxx> · Sun, 7 Jan 2024 10:20:05 +0100

Hi Paul,

your suggestion was correct. The mds went through the replay state and was
a few minutes in the active state. But then it gets killed because of too
high memory consumption.

> @mds.cephfs.storage01.pgperp.service: Main process exited, code=exited,
> status=137/n/a

How could I raise the memory limit for the mds?

>From the looks in htop. It looked like there is a memory leak, because it
consumed over 200 GB of memory while reporting that it actually used 20 -
30 GB.
Is this possible?

Best regardes
Lars

[image: ariadne.ai Logo] Lars Köppel
Developer
Email: lars.koeppel@xxxxxxxxxx
Phone: +49 6221 5993580 <+4962215993580>
ariadne.ai (Germany) GmbH
Häusserstraße 3, 69115 Heidelberg
Amtsgericht Mannheim, HRB 744040
Geschäftsführer: Dr. Fabian Svara
https://ariadne.ai

On Sat, Jan 6, 2024 at 3:33 PM Paul Mezzanini <pfmeec@xxxxxxx> wrote:

> I'm replying from my phone so hopefully this works well.  This sounds
> suspiciously similar to an issue we have run into where there is an
> internal loop in the MDS that doesn't have heartbeat in it. If that loop
> goes for too long, it is marked as failed and the process jumps to another
> server and starts again.
>
> We get around it by "wedging it in a corner" and removing the ability to
> migrate. This is as simple as stopping all standby MDS services and just
> waiting for the MDS to complete.
>
>
>
> --
>
> Paul Mezzanini
> Platform Engineer III
> Research Computing
>
> Rochester Institute of Technology
>
>  Sent from my phone, please excuse typos and brevity
> ------------------------------
> *From:* Lars Köppel <lars.koeppel@xxxxxxxxxx>
> *Sent:* Saturday, January 6, 2024 7:22:14 AM
> *To:* Patrick Donnelly <pdonnell@xxxxxxxxxx>
> *Cc:* ceph-users@xxxxxxx <ceph-users@xxxxxxx>
> *Subject:*  Re: mds crashes after up:replay state
>
> Hi Patrick,
>
> thank you for your response.
> I already changed the mentioned settings, but I had no luck with this.
>
> The journal inspection I had running yesterday finished with: 'Overall
> journal integrity: OK'.
> So you are probably right that the mds is crashing shortly after the replay
> finished.
>
> I checked the logs and there is every few seconds a new FSMap epoch without
> any visible changes. One of the current epochs is at the end. Is there
> anything useful in it?
>
> When the replay is finished the running mds goes to the state
> 'up:reconnect' and after a second to the state 'up:rejoin'. After this
> there is for ~20 min no new fsmap until this message pops up:
>
> > Jan 06 12:38:23 storage01 ceph-mds[223997]:
> > mds.beacon.cephfs.storage01.pgperp Skipping beacon heartbeat to monitors
> > (last acked 4.00012s ago); MDS internal heartbeat is not healthy!
> >
> A few seconds later (the heartbeat message is still there) a new fsmap is
> created with a new mds now in replay state.
> The last of the heartbeat messages is after 1446 seconds. Then it is gone
> and no more warnings or errors are displayed at this point. One minute
> after the last message the mds is back as standy mds.
>
> > Jan 06 13:02:26 storage01 ceph-mds[223997]:
> > mds.beacon.cephfs.storage01.pgperp Skipping beacon heartbeat to monitors
> > (last acked 1446.6s ago); MDS internal heartbeat is not healthy!
> >
>
> Also i can not find any warning in the logs when the mds crashes. What
> could I do to find the error for the crash?
>
> Best regardes
> Lars
>
> e205510
> > enable_multiple, ever_enabled_multiple: 1,1
> > default compat: compat={},rocompat={},incompat={1=base v0.20,2=client
> > writeable ranges,3=default file layouts on dirs,4=dir inode in separate
> > object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
> > anchor table,9=file layout v2,10=snaprealm v2}
> > legacy client fscid: 3
> >
> > Filesystem 'cephfs' (3)
> > fs_name cephfs
> > epoch   205510
> > flags   32 joinable allow_snaps allow_multimds_snaps allow_standby_replay
> > created 2023-06-06T11:44:03.651905+0000
> > modified        2024-01-06T10:28:14.676738+0000
> > tableserver     0
> > root    0
> > session_timeout 60
> > session_autoclose       300
> > max_file_size   8796093022208
> > required_client_features        {}
> > last_failure    0
> > last_failure_osd_epoch  42962
> > compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> > ranges,3=default file layouts on dirs,4=dir inode in separate
> object,5=mds
> > uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline
> > data,8=no anchor table,9=file layout v2,10=snaprealm v2}
> > max_mds 1
> > in      0
> > up      {0=2178448}
> > failed
> > damaged
> > stopped
> > data_pools      [11,12]
> > metadata_pool   10
> > inline_data     disabled
> > balancer
> > standby_count_wanted    1
> > [mds.cephfs.storage01.pgperp{0:2178448} state up:replay seq 4484
> > join_fscid=3 addr [v2:
> > 192.168.0.101:6800/855849996,v1:192.168.0.101:6801/855849996] compat
> > {c=[1],r=[1],i=[7ff]}]
> >
> >
> > Filesystem 'cephfs_recovery' (4)
> > fs_name cephfs_recovery
> > epoch   193460
> > flags   13 allow_snaps allow_multimds_snaps
> > created 2024-01-05T10:47:32.224388+0000
> > modified        2024-01-05T16:43:37.677241+0000
> > tableserver     0
> > root    0
> > session_timeout 60
> > session_autoclose       300
> > max_file_size   1099511627776
> > required_client_features        {}
> > last_failure    0
> > last_failure_osd_epoch  42904
> > compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> > ranges,3=default file layouts on dirs,4=dir inode in separate
> object,5=mds
> > uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline
> > data,8=no anchor table,9=file layout v2,10=snaprealm v2}
> > max_mds 1
> > in      0
> > up      {}
> > failed
> > damaged 0
> > stopped
> > data_pools      [11,12]
> > metadata_pool   13
> > inline_data     disabled
> > balancer
> > standby_count_wanted    1
> >
> >
> > Standby daemons:
> >
> > [mds.cephfs.storage02.zopcif{-1:2356728} state up:standby seq 1
> > join_fscid=3 addr [v2:
> > 192.168.0.102:6800/3567764205,v1:192.168.0.102:6801/3567764205] compat
> > {c=[1],r=[1],i=[7ff]}]
> > dumped fsmap epoch 205510
> >
>
>
> [image: ariadne.ai Logo] Lars Köppel
> Developer
> Email: lars.koeppel@xxxxxxxxxx
> Phone: +49 6221 5993580 <+4962215993580>
> ariadne.ai (Germany) GmbH
> Häusserstraße 3, 69115 Heidelberg
> Amtsgericht Mannheim, HRB 744040
> Geschäftsführer: Dr. Fabian Svara
> https://ariadne.ai
>
>
> On Fri, Jan 5, 2024 at 7:52 PM Patrick Donnelly <pdonnell@xxxxxxxxxx>
> wrote:
>
> > Hi Lars,
> >
> > On Fri, Jan 5, 2024 at 9:53 AM Lars Köppel <lars.koeppel@xxxxxxxxxx>
> > wrote:
> > >
> > > Hello everyone,
> > >
> > > we are running a small cluster with 3 nodes and 25 osds per node. And
> > Ceph
> > > version 17.2.6.
> > > Recently the active mds crashed and since then the new starting mds has
> > > always been in the up:replay state. In the output of the command 'ceph
> > tell
> > > mds.cephfs:0 status' you can see that the journal is completely read
> in.
> > As
> > > soon as it's finished, the mds crashes and the next one starts reading
> > the
> > > journal.
> > >
> > > At the moment I have the journal inspection running
> ('cephfs-journal-tool
> > > --rank=cephfs:0 journal inspect').
> > >
> > > Does anyone have any further suggestions on how I can get the cluster
> > > running again as quickly as possible?
> >
> > Please review:
> >
> >
> https://docs.ceph.com/en/reef/cephfs/troubleshooting/#stuck-during-recovery
> >
> > Note: your MDS is probably not failing in up:replay but shortly after
> > reaching one of the later states. Check the mon logs to see what the
> > FSMap changes were.
> >
> >
> > Patrick Donnelly, Ph.D.
> > He / Him / His
> > Red Hat Partner Engineer
> > IBM, Inc.
> > GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
> >
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx