SnapServer::check_osd_map() and is_removed_snap()

Sage Weil <sweil@xxxxxxxxxx> · Thu, 30 May 2019 14:11:47 +0000 (UTC)

Hi Zheng-

I'm trying to get rid of the removed_snaps member from pg_pool_t as this 
is a scaling problem for aged clusters with lots of removed snapshots.  
I'm down to a handful of users: rados cache tiered pools, and 
SnapServer::check_osd_map().  I'm not entirely following what the 
code is doing with all_purge vs all_purged.. do you mind taking a look?

If we can get away with not using the OSDMap's removed_snaps (and by 
extension is_removed_snap()) at all, that would be ideal.  If the MDS 
really *does* need to know which snaps have been purged from the 
rados pool, then we can instead switch to using the new_removed_snaps 
OSDMap member instead.  The difference is that new_removed_snaps includes 
the snaps that were removed in the current epoch only, so in order to 
reliably capture all removed snaps, the MDS would need to examine every 
OSDMap epoch (not just the latest map).

It looks to me like the MDS basically needs an ack that it's attempt to 
remove a snap has succeeded from the mon, and it's doing that by examining 
the resulting OSDMap.  The mon actually has a durable record for all 
deleted snaps, though, so I suspect the best fix for this is just 
to change the mds <-> mon protocol so that MRemoveSnaps gets an ack back 
after the snap is deleted (or has already been deleted).  Otherwise it 
will be a real challenge for the MDS to ensure that it finds out about 
deleted snaps in the fact of MDS restarts and possible gaps in the osdmap 
history...

Does that seem reasonable?

Thanks!
sage