Re: CephFS metadata pool grows by two orders of magnitude while trimming (?) snapshots

Janek Bevendorff <janek.bevendorff@xxxxxxxxxxxxx> · Mon, 12 Jun 2023 11:31:15 +0200

Good news: We haven't had any new fill-ups so far. On the contrary, the 
pool size is as small as it's ever been (200GiB).

Bad news: The MDS are still acting strangely. I have very uneven session 
load and I don't know where it comes from. ceph_mds_sessions_total_load 
reports a number of 1.4 million on mds.3, whereas all the others are 
mostly idle. I checked the client list on that rank, but the heaviest 
client has about 8k caps, which isn't very much at all. Most have 0 or 
1. I don't see any blocked ops in flight. I don't think this is to do 
with the disabled balancer, because I've seen this pattern before.

The event log size of 3/5 MDS is also very high, still. mds.1, mds.3, 
and mds.4 report between 4 and 5 million events, mds.0 around 1.4 
million and mds.2 between 0 and 200,000. The numbers have been constant 
since my last MDS restart four days ago.

I ran your ceph-gather.sh script a couple of times, but dumps only 
mds.0. Should I modify it to dump mds.3 instead so you can have a look?

Janek

On 10/06/2023 15:23, Patrick Donnelly wrote:
On Fri, Jun 9, 2023 at 3:27 AM Janek Bevendorff
<janek.bevendorff@xxxxxxxxxxxxx> wrote:
Hi Patrick,

I'm afraid your ceph-post-file logs were lost to the nether. AFAICT,
our ceph-post-file storage has been non-functional since the beginning
of the lab outage last year. We're looking into it.
I have it here still. Any other way I can send it to you?
Nevermind, I found the machine it was stored on. It was a
misconfiguration caused by post-lab-outage rebuilds.

Extremely unlikely.
Okay, taking your word for it. But something seems to be stalling
journal trimming. We had a similar thing yesterday evening, but at much
smaller scale without noticeable pool size increase. I only got an alert
that the ceph_mds_log_ev Prometheus metric starting going up again for a
single MDS. It grew past 1M events, so I restarted it. I also restarted
the other MDS and they all immediately jumped to above 5M events and
stayed there. They are, in fact, still there and have decreased only
very slightly in the morning. The pool size is totally within a normal
range, though, at 290GiB.
Please keep monitoring it. I think you're not the only cluster to
experience this.

So clearly (a) an incredible number of journal events are being logged
and (b) trimming is slow or unable to make progress. I'm looking into
why but you can help by running the attached script when the problem
is occurring so I can investigate. I'll need a tarball of the outputs.
How do I send it to you if not via ceph-post-file?
It should work soon next week. We're moving the drop.ceph.com service
to a standalone VM soonish.

Also, in the off-chance this is related to the MDS balancer, please
disable it since you're using ephemeral pinning:

ceph config set mds mds_bal_interval 0
Done.

Thanks for your help!
Janek

--

Bauhaus-Universität Weimar
Bauhausstr. 9a, R308
99423 Weimar, Germany

Phone: +49 3643 58 3577
www.webis.de

--

Bauhaus-Universität Weimar
Bauhausstr. 9a, R308
99423 Weimar, Germany

Phone: +49 3643 58 3577
www.webis.de
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx