Another thing I just noticed is that the auto-scaler is trying to scale
the pool down to 128 PGs. That could also result in large fluctuations,
but this big?? In any case, it looks like a bug to me. Whatever is
happening here, there should be safeguards with regard to the pool's
capacity.
Here's the current state of the pool in ceph osd pool ls detail:
pool 110 'cephfs.storage.meta' replicated size 4 min_size 3 crush_rule 5
object_hash rjenkins pg_num 495 pgp_num 471 pg_num_target 128
pgp_num_target 128 autoscale_mode on last_change 1359013 lfor
0/1358620/1358618 flags hashpspool,nodelete stripe_width 0
expected_num_objects 3000000 recovery_op_priority 5 recovery_priority 2
application cephfs
Janek
On 31/05/2023 10:10, Janek Bevendorff wrote:
Forgot to add: We are still on Nautilus (16.2.12).
On 31/05/2023 09:53, Janek Bevendorff wrote:
Hi,
Perhaps this is a known issue and I was simply too dumb to find it,
but we are having problems with our CephFS metadata pool filling up
over night.
Our cluster has a small SSD pool of around 15TB which hosts our
CephFS metadata pool. Usually, that's more than enough. The normal
size of the pool ranges between 200 and 800GiB (which is quite a lot
of fluctuation already). Yesterday, we had suddenly had the pool fill
up entirely and they only way to fix it was to add more capacity. I
increased the pool size to 18TB by adding more SSDs and could resolve
the problem. After a couple of hours of reshuffling, the pool size
finally went back to 230GiB.
But then we had another fill-up tonight to 7.6TiB. Luckily, I had
adjusted the weights so that not all disks could fill up entirely
like last time, so it ended there.
I wasn't really able to identify the problem yesterday, but under the
more controllable scenario today, I could check the MDS logs at
debug_mds=10 and to me it seems like the problem is caused by
snapshot trimming. The logs contain a lot of snapshot-related
messages for paths that haven't been touched in a long time. The
messages all look something like this:
May 31 09:16:48 XXX ceph-mds[2947525]: 2023-05-31T09:16:48.292+0200
7f7ce1bd9700 10 mds.1.cache.ino(0x1000b3c3670) add_client_cap first
cap, joining realm snaprealm(0x10000000000 seq 1b1c lc 1b1b cr 1
b1b cps 2 snaps={185f=snap(185f 0x10000000000 'monthly_20221201'
2022-12-01T00:00:01.530830+0100),18de=snap(18de 0x10000000000
'monthly_20230101' 2023-01-01T00:00:04.657252+0100),1941=snap(1941
0x10000000000 ...
May 31 09:25:03 XXX ceph-mds[3268481]: 2023-05-31T09:25:03.396+0200
7f0e6a6ca700 10 mds.0.cache | |______ 3 rep [dir
0x100000218fe.101111101* /storage/REDACTED/| ptrwaiter=0 request=0
child=0 frozen=0 subtree=1 replicated=0 dirty=0 waiter=0 authpin=0
tempexporting=0 0x5607759d9600]
May 31 09:25:03 XXX ceph-mds[3268481]: 2023-05-31T09:25:03.452+0200
7f0e6a6ca700 10 mds.0.cache | | |____ 4 rep [dir
0x100000ff904.100111101010* /storage/REDACTED/| ptrwaiter=0 request=0
child=0 frozen=0 subtree=1 importing=0 replicated=0 waiter=0
authpin=0 tempexporting=0 0x56034ed25200]
May 31 09:25:03 XXX ceph-mds[3268481]: 2023-05-31T09:25:03.716+0200
7f0e6becd700 10 mds.0.server set_trace_dist snaprealm
snaprealm(0x10000000000 seq 1b1c lc 1b1b cr 1b1b cps 2
snaps={185f=snap(185f 0x10000000000 'monthly_20221201'
2022-12-01T00:00:01.530830+0100),18de=snap(18de 0x10000000000
'monthly_20230101' 2023-01-01T00:00:04.657252+0100),1941=snap(1941
0x10000000000 'monthly_20230201'
2023-02-01T00:00:01.854059+0100),19a6=snap(19a6 0x10000000000
'monthly_20230301' 2023-03-01T00:00:01.215197+0100),1a24=snap(1a24
0x10000000000 'monthly_20230401' ...) len=384
May 31 09:25:36 deltaweb055 ceph-mds[3268481]:
2023-05-31T09:25:36.076+0200 7f0e6becd700 10
mds.0.cache.ino(0x10004d74911) remove_client_cap last cap, leaving
realm snaprealm(0x10000000000 seq 1b1c lc 1b1b cr 1b1b cps 2
snaps={185f=snap(185f 0x10000000000 'monthly_20221201'
2022-12-01T00:00:01.530830+0100),18de=snap(18de 0x10000000000
'monthly_20230101' ...)
The daily_*, montly_* etc. names are the names of our regular snapshots.
I posted a larger log file snippet using ceph-post-file with the ID:
da0eb93d-f340-4457-8a3f-434e8ef37d36
Is it possible that the MDS are trimming old snapshots without taking
care not to fill up the entire metadata pool?
Cheers
Janek
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx