CephFS metadata pool grows by two orders of magnitude while trimming (?) snapshots

Janek Bevendorff <janek.bevendorff@xxxxxxxxxxxxx> · Wed, 31 May 2023 09:53:30 +0200

Hi,

Perhaps this is a known issue and I was simply too dumb to find it, but 
we are having problems with our CephFS metadata pool filling up over night.

Our cluster has a small SSD pool of around 15TB which hosts our CephFS 
metadata pool. Usually, that's more than enough. The normal size of the 
pool ranges between 200 and 800GiB (which is quite a lot of fluctuation 
already). Yesterday, we had suddenly had the pool fill up entirely and 
they only way to fix it was to add more capacity. I increased the pool 
size to 18TB by adding more SSDs and could resolve the problem. After a 
couple of hours of reshuffling, the pool size finally went back to 230GiB.

But then we had another fill-up tonight to 7.6TiB. Luckily, I had 
adjusted the weights so that not all disks could fill up entirely like 
last time, so it ended there.

I wasn't really able to identify the problem yesterday, but under the 
more controllable scenario today, I could check the MDS logs at 
debug_mds=10 and to me it seems like the problem is caused by snapshot 
trimming. The logs contain a lot of snapshot-related messages for paths 
that haven't been touched in a long time. The messages all look 
something like this:

May 31 09:16:48 XXX ceph-mds[2947525]: 2023-05-31T09:16:48.292+0200 
7f7ce1bd9700 10 mds.1.cache.ino(0x1000b3c3670) add_client_cap first cap, 
joining realm snaprealm(0x10000000000 seq 1b1c lc 1b1b cr 1
b1b cps 2 snaps={185f=snap(185f 0x10000000000 'monthly_20221201' 
2022-12-01T00:00:01.530830+0100),18de=snap(18de 0x10000000000 
'monthly_20230101' 2023-01-01T00:00:04.657252+0100),1941=snap(1941 
0x10000000000 ...

May 31 09:25:03 XXX ceph-mds[3268481]: 2023-05-31T09:25:03.396+0200 
7f0e6a6ca700 10 mds.0.cache | |______ 3     rep [dir 
0x100000218fe.101111101* /storage/REDACTED/| ptrwaiter=0 request=0 
child=0 frozen=0 subtree=1 replicated=0 dirty=0 waiter=0 authpin=0 
tempexporting=0 0x5607759d9600]

May 31 09:25:03 XXX ceph-mds[3268481]: 2023-05-31T09:25:03.452+0200 
7f0e6a6ca700 10 mds.0.cache | | |____ 4     rep [dir 
0x100000ff904.100111101010* /storage/REDACTED/| ptrwaiter=0 request=0 
child=0 frozen=0 subtree=1 importing=0 replicated=0 waiter=0 authpin=0 
tempexporting=0 0x56034ed25200]

May 31 09:25:03 XXX ceph-mds[3268481]: 2023-05-31T09:25:03.716+0200 
7f0e6becd700 10 mds.0.server set_trace_dist snaprealm 
snaprealm(0x10000000000 seq 1b1c lc 1b1b cr 1b1b cps 2 
snaps={185f=snap(185f 0x10000000000 'monthly_20221201' 
2022-12-01T00:00:01.530830+0100),18de=snap(18de 0x10000000000 
'monthly_20230101' 2023-01-01T00:00:04.657252+0100),1941=snap(1941 
0x10000000000 'monthly_20230201' 
2023-02-01T00:00:01.854059+0100),19a6=snap(19a6 0x10000000000 
'monthly_20230301' 2023-03-01T00:00:01.215197+0100),1a24=snap(1a24 
0x10000000000 'monthly_20230401'  ...) len=384

May 31 09:25:36 deltaweb055 ceph-mds[3268481]: 
2023-05-31T09:25:36.076+0200 7f0e6becd700 10 
mds.0.cache.ino(0x10004d74911) remove_client_cap last cap, leaving realm 
snaprealm(0x10000000000 seq 1b1c lc 1b1b cr 1b1b cps 2 
snaps={185f=snap(185f 0x10000000000 'monthly_20221201' 
2022-12-01T00:00:01.530830+0100),18de=snap(18de 0x10000000000 
'monthly_20230101'  ...)

The daily_*, montly_* etc. names are the names of our regular snapshots.

I posted a larger log file snippet using ceph-post-file with the ID: 
da0eb93d-f340-4457-8a3f-434e8ef37d36

Is it possible that the MDS are trimming old snapshots without taking 
care not to fill up the entire metadata pool?

Cheers
Janek
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx