rgw: bucket deletion in multisite

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi rgw folks, this is a rough design for cleanup of deleted buckets in multisite. I would love some review/feedback.

Motivation:
    - Bucket deletion in a multisite configuration does not delete bucket instance metadata, bucket sync status, or bucket index objects on any zone. This allows bucket sync on each zone to finish processing object deletions and (hopefully) converge on empty.

Requirements:
    - Remove all objects associated with deleted buckets in a timely manner:         - bucket instance metadata, bucket index shards, and bucket sync status
        - all object data
    - Does not rely on bucket sync to delete all objects [zone A may delete an empty bucket that hasn't yet synced objects from zone B, so the zones would converge on zone B's objects]     - Strategy to clean up already-deleted buckets, ie 'radosgw-admin bucket stale-instances rm' command

Summary:
    - Add a process for 'deferred bucket deletion', where local bucket instance metadata is removed and the bucket index/data are scheduled for later 'bucket gc'. A new 'bucket gc list' is stored in omap and processed by a worker similar to existing gc.     - For metadata sync, the metadata log format needs to be extended to distinguish between normal writes and deletion events on bucket instances. When metadata sync encounters a bucket instance deletion, it runs 'deferred bucket deletion'.     - Data sync on the bucket needs to avoid creating new objects while bucket gc is running.

mdlog:
    - entries must distinguish between Write, Remove, and Delete (where Delete implies gc of associated data)
    - a 'bucket rm' Deletes its bucket instance metadata
    - a 'bucket reshard' Removes the old bucket instance because the new bucket instance still owns the data

Bucket gc list:
    - stored in omap in the log pool
    - sharded over multiple objects
    - each entry encodes RGWBucketInfo (needed to delete objects after bucket instance is deleted)

Bucket index:
    - add REMOVE_ONLY flag to bucket index to prevent object creation from racing with bucket gc

Deferred bucket delete:
    - flag bucket index shards as REMOVE_ONLY
    - add to 'bucket gc' list (entry includes encoded RGWBucketInfo) *requires access to existing bucket instance metadata*
    - delete local bucket instance (add Delete entry to mdlog)

Metadata sync:
    - must serialize sync of mdlog entries with the same metadata key, to preserve order of Writes vs Removes/Deletes
        - can skip Writes if they're followed by Removes/Deletes
    - on Delete of bucket instance, run deferred bucket delete
    - backward compatibility: what to do with mdlog entries that don't specify Write/Remove/Delete?         - for bucket instance: assume write (because we never deleted them before upgrade), and just try to fetch         - for other metadata: use existing strategy to fetch remote metadata, and remove local metadata on 404/ENOENT

Bucket sync:
    - bucket sync first fetches bucket instance - on ENOENT, exit bucket sync with success     - if sync_object() returns REMOVE_ONLY error from bucket index, exit bucket sync with success     - read/fetch bucket instance metadata before taking lease to avoid recreating bucket sync status objects

Bucket gc worker:
    - for each bucket in gc list:
        - decode RGWBucketInfo
        - delete each object in bucket [should we GC tail objects or delete inline?]
        - delete incomplete multiparts
        - delete bucket index objects
        - delete bucket sync status objects

radosgw-admin bucket stale-instances rm:
    - run deferred bucket delete on each bucket instance that:
        - does not have an associated bucket entrypoint
        - has a bucket id matching its bucket marker? (has not been resharded)
    - must be safe to run on any zone after upgrade



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux