On Thu, May 30, 2019 at 10:13 PM Sage Weil <sweil@xxxxxxxxxx> wrote: > > Hi Zheng- > > I'm trying to get rid of the removed_snaps member from pg_pool_t as this > is a scaling problem for aged clusters with lots of removed snapshots. > I'm down to a handful of users: rados cache tiered pools, and > SnapServer::check_osd_map(). I'm not entirely following what the > code is doing with all_purge vs all_purged.. do you mind taking a look? > all_purge are snaps in need_to_purge set , which really need to purge. all_purged are snaps in need_to_purge, which have already been purged. > If we can get away with not using the OSDMap's removed_snaps (and by > extension is_removed_snap()) at all, that would be ideal. If the MDS > really *does* need to know which snaps have been purged from the > rados pool, then we can instead switch to using the new_removed_snaps > OSDMap member instead. The difference is that new_removed_snaps includes > the snaps that were removed in the current epoch only, so in order to > reliably capture all removed snaps, the MDS would need to examine every > OSDMap epoch (not just the latest map). > > It looks to me like the MDS basically needs an ack that it's attempt to > remove a snap has succeeded from the mon, and it's doing that by examining > the resulting OSDMap. The mon actually has a durable record for all > deleted snaps, though, so I suspect the best fix for this is just > to change the mds <-> mon protocol so that MRemoveSnaps gets an ack back > after the snap is deleted (or has already been deleted). Otherwise it > will be a real challenge for the MDS to ensure that it finds out about > deleted snaps in the fact of MDS restarts and possible gaps in the osdmap > history... > > Does that seem reasonable? > ACK approach should work. MDS just needs to call SnapServer::do_server_update() for the ACK. Regards Yan, Zheng > Thanks! > sage