Hi, We have a customer with an abnormally large number of "osd_snap / purged_snap_{pool}_{snapid}" keys in monstore db: almost 40 million. Among other problems it causes a very long mon synchronization on startup. Our understanding is that the cause is that a mirroring snapshot creation is very frequently interrupted in their environment, most likely due to connectivity issues between the sites. The assumption is based on the fact that they have a lot of rbd "trash" snapshots, which may happen when an rbd snapshot removal is interrupted. (A mirroring snapshot creation usually includes removal of some older snapshot to keep the total number of the image mirroring snapshots under the limit). We removed all "trash" snapshots manually, so currently they have a limited number of "expected" snapshots but the number of purged_snap keys is still the same large. So, our understanding is that if an rbd snapshot creation is frequently interrupted there is a chance it will be interrupted in or just after SnapshotCreateRequest::send_allocate_snap_id [1], when it requests a new snap id from the mon. As a result this id is not tracked by rbd and never removed, and snap id holes like this make "purged_snap_{pool}_{snapid}" ranges never merge. To confirm that this scenario is likely I ran the following simple test that interrupted rbd mirror snapshot creation at random time: for i in `seq 500`;do rbd mirror image snapshot test& PID=$! sleep $((RANDOM % 5)).$((RANDOM % 10)) kill $PID && sleep 30 done Running this with debug_rbd=30, from the rbd client logs I see that it was interrupted in send_allocate_snap_id 74 times, which is (surprisingly) very high. And after the experiment, and after removing the rbd image with all tracked snapshots (i.e having the pool with no known rbd snapshots), I see "purged_snap_{pool}_{snapid}" keys for ranges that I believe will never be merged. So the questions are: 1) Is there a way we could improve this to avoid monstore growing large? 2) How can we fix the current situation in the cluster? Would it be safe enough to just run `ceph-kvstore-tool rocksdb store.db rm-prefix osd_snap` to remove all osd_snap keys (including purged_epoch keys)? Due to large db size I don't think it would be possible to selectively remove keys with `ceph-kvstore-tool rocksdb store.db rm {prefix} {key}` command and we may use only the `rm-prefix` command. Looking at the code and actually trying it in a test environment it seems like it could work, but I may be missing something dangerous here? If (1) is not possible, then maybe we could provide a tool/command for users to clean the keys if they observe this issue? [1] https://github.com/ceph/ceph/blob/e45272df047af71825445aeb6503073ba06123b0/src/librbd/operation/SnapshotCreateRequest.cc#L185 Thanks, -- Mykola Golub _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx