rgw multisite: mdlog transactions for metadata sync
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
- To: Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx>
- Subject: rgw multisite: mdlog transactions for metadata sync
- From: Casey Bodley <cbodley@xxxxxxxxxx>
- Date: Mon, 15 Apr 2019 13:09:43 -0400
- Cc: The Sacred Order of the Squid Cybernetic <ceph-devel@xxxxxxxxxxxxxxx>
- User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1
Hi Yehuda,
I'm working on a design for the cleanup of deleted buckets in multisite.
To do this, I'd like to trigger some actions on secondary zones when
metadata sync sees a bucket instance get deleted. The first obstacle
here is that metadata sync can't differentiate between writes and
deletes due to how the mdlog transactions are structured.
RGWMetadataManager::pre_modify() writes an mdlog entry with the status
of MDLOG_STATUS_WRITE/REMOVE, and post_modify() completes the
transaction with a MDLOG_STATUS_COMPLETE entry. So only the 'prepare'
step knows what kind of op it was, and sync can't reliably associate a
COMPLETE with its prepare because mdlog trimming may have deleted the
prepare.
In RGWMetaSyncSingleEntryCR, metadata sync filters out any entries that
aren't MDLOG_STATUS_COMPLETE, and tries to infer the deletes based on
whether RGWReadRemoteMetadataCR returns ENOENT. This part should be
explicit if it's going to trigger further object deletion, so I'd like
to add a separate 'op' field to the mdlog for this.
I'm also wondering if this separate 'prepare' entry is worth writing,
given that we ignore it during sync - I'd like to remove it if we can,
the same way I proposed for the bucket index log in
https://github.com/ceph/ceph/pull/26755. Do you see a reason to keep
either of those?
Thanks,
Casey
[Index of Archives]
[CEPH Users]
[Ceph Large]
[Information on CEPH]
[Linux BTRFS]
[Linux USB Devel]
[Video for Linux]
[Linux Audio Users]
[Yosemite News]
[Linux Kernel]
[Linux SCSI]