Re: CephFS metadata pool grows by two orders of magnitude while trimming (?) snapshots

Patrick Donnelly <pdonnell@xxxxxxxxxx> · Fri, 16 Jun 2023 14:37:28 -0400

Hi Janek,

On Mon, Jun 12, 2023 at 5:31 AM Janek Bevendorff
<janek.bevendorff@xxxxxxxxxxxxx> wrote:
>
> Good news: We haven't had any new fill-ups so far. On the contrary, the
> pool size is as small as it's ever been (200GiB).

Great!

> Bad news: The MDS are still acting strangely. I have very uneven session
> load and I don't know where it comes from. ceph_mds_sessions_total_load
> reports a number of 1.4 million on mds.3, whereas all the others are
> mostly idle. I checked the client list on that rank, but the heaviest
> client has about 8k caps, which isn't very much at all. Most have 0 or
> 1. I don't see any blocked ops in flight. I don't think this is to do
> with the disabled balancer, because I've seen this pattern before.

That's interesting... I don't have an explanation.

> The event log size of 3/5 MDS is also very high, still. mds.1, mds.3,
> and mds.4 report between 4 and 5 million events, mds.0 around 1.4
> million and mds.2 between 0 and 200,000. The numbers have been constant
> since my last MDS restart four days ago.
>
> I ran your ceph-gather.sh script a couple of times, but dumps only
> mds.0. Should I modify it to dump mds.3 instead so you can have a look?

Yes, please.

> Janek
>
>
> On 10/06/2023 15:23, Patrick Donnelly wrote:
> > On Fri, Jun 9, 2023 at 3:27 AM Janek Bevendorff
> > <janek.bevendorff@xxxxxxxxxxxxx> wrote:
> >> Hi Patrick,
> >>
> >>> I'm afraid your ceph-post-file logs were lost to the nether. AFAICT,
> >>> our ceph-post-file storage has been non-functional since the beginning
> >>> of the lab outage last year. We're looking into it.
> >> I have it here still. Any other way I can send it to you?
> > Nevermind, I found the machine it was stored on. It was a
> > misconfiguration caused by post-lab-outage rebuilds.
> >
> >>> Extremely unlikely.
> >> Okay, taking your word for it. But something seems to be stalling
> >> journal trimming. We had a similar thing yesterday evening, but at much
> >> smaller scale without noticeable pool size increase. I only got an alert
> >> that the ceph_mds_log_ev Prometheus metric starting going up again for a
> >> single MDS. It grew past 1M events, so I restarted it. I also restarted
> >> the other MDS and they all immediately jumped to above 5M events and
> >> stayed there. They are, in fact, still there and have decreased only
> >> very slightly in the morning. The pool size is totally within a normal
> >> range, though, at 290GiB.
> > Please keep monitoring it. I think you're not the only cluster to
> > experience this.
> >
> >>> So clearly (a) an incredible number of journal events are being logged
> >>> and (b) trimming is slow or unable to make progress. I'm looking into
> >>> why but you can help by running the attached script when the problem
> >>> is occurring so I can investigate. I'll need a tarball of the outputs.
> >> How do I send it to you if not via ceph-post-file?
> > It should work soon next week. We're moving the drop.ceph.com service
> > to a standalone VM soonish.
> >
> >>> Also, in the off-chance this is related to the MDS balancer, please
> >>> disable it since you're using ephemeral pinning:
> >>>
> >>> ceph config set mds mds_bal_interval 0
> >> Done.
> >>
> >> Thanks for your help!
> >> Janek
> >>
> >>
> >> --
> >>
> >> Bauhaus-Universität Weimar
> >> Bauhausstr. 9a, R308
> >> 99423 Weimar, Germany
> >>
> >> Phone: +49 3643 58 3577
> >> www.webis.de
> >>
> >
> --
>
> Bauhaus-Universität Weimar
> Bauhausstr. 9a, R308
> 99423 Weimar, Germany
>
> Phone: +49 3643 58 3577
> www.webis.de
>

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx