rbd mirror snapshot trash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Good morning,

I would be grateful if anybody could shed some light on this, I can't reproduce it in my lab clusters so I was hoping for the community. A customer has 2 clusters with rbd mirroring (snapshots) enabled, it seems to work fine, they have regular checks and the images on the remote site are synced correctly. While investigating a different issue (could be related though) we noticed that there are quite a lot of snapshots in the trash namespace but for unkown reasons they are not purged. This is some example output from the remote site (the primary site looks similar):

---snip---
# rbd snap ls --all <pool>/<image>
SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 23962608 42c2e4c3-e1ab-480f-9d29-aab5555c751b 20 GiB Tue Sep 20 20:33:18 2022 trash (.mirror.non_primary.61ef3a1e-6e5b-4147-ac62-552e4776dd70.ef345a74-2585-48ac-8e25-09b15c20c877) 96796437 24cc96d4-51c6-447e-870d-1545d8ec9308 20 GiB Tue Apr 18 12:15:28 2023 trash (.mirror.non_primary.61ef3a1e-6e5b-4147-ac62-552e4776dd70.a6ae46e4-cbb5-4264-b2e4-6b565820aa16) 110025019 .mirror.non_primary.61ef3a1e-6e5b-4147-ac62-552e4776dd70.0fc73e88-6671-4ba8-a118-8fcbcd422c22 20 GiB Mon May 15 16:27:39 2023 mirror (non-primary peer_uuids:[] 18ac37b9-a6c6-4f64-ba43-a1f0d02b3f96:115986586 copied)
---snip---

They're on latest Octopus and use all config defaults. There are more than 500 images in that pool with a snapshot schedule of 3 minutes for each images. A couple of questions arised and hopefully at least some of them can be answered:

- Why are there snapshots in the trash namespace from September 2022, how can they be cleaned up and why aren't they cleaned up automatically? Not all images have trash entries though. - Could disabling mirroring for those images help getting rid of the trash, then enable mirroring again? - Is the default osd_max_trimming_pgs = 2 a bottleneck here? They have a week old sst files in the store.db, leading to long startup duration for MONs after reboot/restart. Would increasing that value help getting rid of the trash entries and maybe also trim the mon store? - Regarding the general snapshot mirroring procedure, the default of rbd_mirroring_max_mirroring_snapshots is 5, but I assume that the number of active snapshots would only grow in case of a disruption between those sites, correct? If the sync works there's no need to keep 5 snapshots, right?

I'm still looking into these things myself but I'd appreciate anyone chiming in here.

Thanks!
Eugen
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux