Good morning,
I would be grateful if anybody could shed some light on this, I can't
reproduce it in my lab clusters so I was hoping for the community.
A customer has 2 clusters with rbd mirroring (snapshots) enabled, it
seems to work fine, they have regular checks and the images on the
remote site are synced correctly. While investigating a different
issue (could be related though) we noticed that there are quite a lot
of snapshots in the trash namespace but for unkown reasons they are
not purged. This is some example output from the remote site (the
primary site looks similar):
---snip---
# rbd snap ls --all <pool>/<image>
SNAPID NAME
SIZE PROTECTED TIMESTAMP
NAMESPACE
23962608 42c2e4c3-e1ab-480f-9d29-aab5555c751b
20 GiB Tue Sep 20
20:33:18 2022 trash
(.mirror.non_primary.61ef3a1e-6e5b-4147-ac62-552e4776dd70.ef345a74-2585-48ac-8e25-09b15c20c877)
96796437 24cc96d4-51c6-447e-870d-1545d8ec9308
20 GiB Tue Apr 18
12:15:28 2023 trash
(.mirror.non_primary.61ef3a1e-6e5b-4147-ac62-552e4776dd70.a6ae46e4-cbb5-4264-b2e4-6b565820aa16)
110025019
.mirror.non_primary.61ef3a1e-6e5b-4147-ac62-552e4776dd70.0fc73e88-6671-4ba8-a118-8fcbcd422c22 20 GiB Mon May 15 16:27:39 2023 mirror (non-primary peer_uuids:[] 18ac37b9-a6c6-4f64-ba43-a1f0d02b3f96:115986586
copied)
---snip---
They're on latest Octopus and use all config defaults. There are more
than 500 images in that pool with a snapshot schedule of 3 minutes for
each images.
A couple of questions arised and hopefully at least some of them can
be answered:
- Why are there snapshots in the trash namespace from September 2022,
how can they be cleaned up and why aren't they cleaned up
automatically? Not all images have trash entries though.
- Could disabling mirroring for those images help getting rid of the
trash, then enable mirroring again?
- Is the default osd_max_trimming_pgs = 2 a bottleneck here? They have
a week old sst files in the store.db, leading to long startup duration
for MONs after reboot/restart. Would increasing that value help
getting rid of the trash entries and maybe also trim the mon store?
- Regarding the general snapshot mirroring procedure, the default of
rbd_mirroring_max_mirroring_snapshots is 5, but I assume that the
number of active snapshots would only grow in case of a disruption
between those sites, correct? If the sync works there's no need to
keep 5 snapshots, right?
I'm still looking into these things myself but I'd appreciate anyone
chiming in here.
Thanks!
Eugen
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx