Re: rgw: design proposal for 'bucket stats --reset-stats'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This looks like a solid algorithm to accomplish the intended task and respect the various constraints imposed. Very nice!!

Eric

> On Nov 4, 2021, at 3:31 PM, Casey Bodley <cbodley@xxxxxxxxxx> wrote:
> 
> # motivation
> 
> historically, rgw has had several bugs that led to inconsistencies
> with its 'bucket stats'. currently, the only way to rectify these
> inconsistencies is the 'radosgw-admin bucket reshard' command, because
> the act of resharding rebuilds these stats from scratch in each new
> bucket index shard
> 
> but because this relies on bucket resharding, it can't currently be
> used in multisite configurations. and even once multisite does support
> resharding, the act of resharding still requires radosgw to block
> writes during the process. i think we can do better with a targeted
> command like 'radosgw-admin bucket stats --reset-stats' to match our
> existing 'radosgw-admin user stats --reset-stats'
> 
> in https://github.com/ceph/ceph/pull/23586, Orit pursued an earlier
> 'offline' design which required the shutdown of all radosgws in order
> to rebuild a consistent view of the stats. this work was never
> completed, and 'radosgw-admin bucket reshard' was used instead as a
> workaround
> 
> # requirements
> 
> * reconciles the 'bucket stats' with a full listing of the bucket
> * does not require bucket reshard
> * does not require clients to stop i/o
> * limits the number of bucket index entries per osd op to
> 'osd_max_omap_entries_per_request'
> * prevents racing reset-stats commands from corrupting the stats
> 
> # design
> 
> the stats of each bucket index shard object are stored separately by
> cls_rgw in 'struct rgw_bucket_dir_header'. within each index shard, we
> also track stats per category in member variable
> 'std::map<RGWObjCategory, rgw_bucket_category_stats> stats'. these
> stats are updated by cls_rgw as bucket index transactions complete.
> the 'radosgw-admin bucket stats' command reads the stats from each
> index shard, and sums them up for display
> 
> i propose a new 'bucket stats --reset-stats' command that makes
> consecutive calls to a new cls_rgw_recalc_stats() op, to eventually
> list all of its bucket index entries, accumulate their stats in a
> temporary map, then commit those updated stats once the listing
> reaches the end
> 
> to support other writes to the bucket index during this process, the
> temporary map of stats is stored inside 'struct rgw_bucket_dir_header'
> as 'std::map<RGWObjCategory, rgw_bucket_category_stats> recalc_stats',
> so that bucket index transactions are able to update both the 'stats'
> and 'recalc_stats'. these updates to 'recalc_stats' would be
> conditional on the current position of the 'recalc_marker' - if the
> index entry's key is less than 'recalc_marker', then
> cls_rgw_recalc_stats() already missed this entry and we need to
> account for it in 'recalc_stats'. otherwise, cls_rgw_recalc_stats()
> will see this entry later in its listing and account for it then
> 
> the new cls_rgw operation 'cls_rgw_recalc_stats()' implements the
> logic for a single osd op. this takes as input the marker position to
> resume its listing, and returns this updated marker position as output
> (relying on LIBRADOS_OPERATION_RETURNVEC since this is a write
> operation). the op itself just lists ~1000 omap keys, accumulates
> their stats in 'recalc_stats', then writes the updated 'recalc_stats'
> and 'recalc_marker' position to 'struct rgw_bucket_dir_header'. once
> cls_rgw_recalc_stats() reaches the end of the listing, it can
> overwrite 'stats' with 'recalc_stats', and clear
> 'recalc_stats'/'recalc_marker'
> 
> to handle racing invocations of the 'bucket stats --reset-stats'
> command, cls_rgw_recalc_stats() requests with an empty marker will
> always succeed and start with a fresh listing. but when resuming with
> a non-empty marker, cls_rgw_recalc_stats() will compare that marker
> against the stored 'recalc_marker', and return -ECANCELED if they
> don't match to indicate a racing write. the end result is that new
> invocations of 'bucket stats --reset-stats' will cancel any previous
> invocations
> 

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux