rgw multisite: mdlog transactions for metadata sync

Casey Bodley <cbodley@xxxxxxxxxx> · Mon, 15 Apr 2019 13:09:43 -0400

Hi Yehuda,

I'm working on a design for the cleanup of deleted buckets in multisite. 
To do this, I'd like to trigger some actions on secondary zones when 
metadata sync sees a bucket instance get deleted. The first obstacle 
here is that metadata sync can't differentiate between writes and 
deletes due to how the mdlog transactions are structured.

RGWMetadataManager::pre_modify() writes an mdlog entry with the status 
of MDLOG_STATUS_WRITE/REMOVE, and post_modify() completes the 
transaction with a MDLOG_STATUS_COMPLETE entry. So only the 'prepare' 
step knows what kind of op it was, and sync can't reliably associate a 
COMPLETE with its prepare because mdlog trimming may have deleted the 
prepare.

In RGWMetaSyncSingleEntryCR, metadata sync filters out any entries that 
aren't MDLOG_STATUS_COMPLETE, and tries to infer the deletes based on 
whether RGWReadRemoteMetadataCR returns ENOENT. This part should be 
explicit if it's going to trigger further object deletion, so I'd like 
to add a separate 'op' field to the mdlog for this.

I'm also wondering if this separate 'prepare' entry is worth writing, 
given that we ignore it during sync - I'd like to remove it if we can, 
the same way I proposed for the bucket index log in 
https://github.com/ceph/ceph/pull/26755. Do you see a reason to keep 
either of those?

Thanks,
Casey