ceph fs snaptrim catch-up

Frank Schilder <frans@xxxxxx> · Thu, 24 Feb 2022 12:25:08 +0000

Hi all,

I have another counter-intuitive observation regarding snaptrim. Our users are currently cleaning up a lot and also moving an exceptional amount of data around. Therefore, snaptrim cannot complete within a 24h window (daily global snapshots). It currently manages to process about 50% of the PGs. This means that PGs become scheduled for snaptrim while there are still PGs in snaptrim_wait state.

The surprising observation I make now is that the max of snaptrimq_len over all PGs is constantly increasing. With being able to process about 50% of the PGs, expected is that it settles at a low value and/or increases much slower than +2 per day, because the PGs with the longest queue ought to be scheduled first and if I can do at least half, the queue length should not increase every day.

My guess is that this is not the case. The PGs are *not* scheduled according to "longest queue first". Is this correct? If so, is it worth making this a feature request?

In line with that, I also observed that degraded PGs seem not to be scheduled for recovery according to "most degraded first". A PG with 2 shards missing should be recovered before any PG with only 1 shard missing, but that's not what I saw. I had to raise priority manually by forcing recovery. Is this expected?

All observations with mimic.

PS: No worries about the snaptrim catch up, it will happen eventually. The current workload is only temporary.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx