# motivation historically, rgw has had several bugs that led to inconsistencies with its 'bucket stats'. currently, the only way to rectify these inconsistencies is the 'radosgw-admin bucket reshard' command, because the act of resharding rebuilds these stats from scratch in each new bucket index shard but because this relies on bucket resharding, it can't currently be used in multisite configurations. and even once multisite does support resharding, the act of resharding still requires radosgw to block writes during the process. i think we can do better with a targeted command like 'radosgw-admin bucket stats --reset-stats' to match our existing 'radosgw-admin user stats --reset-stats' in https://github.com/ceph/ceph/pull/23586, Orit pursued an earlier 'offline' design which required the shutdown of all radosgws in order to rebuild a consistent view of the stats. this work was never completed, and 'radosgw-admin bucket reshard' was used instead as a workaround # requirements * reconciles the 'bucket stats' with a full listing of the bucket * does not require bucket reshard * does not require clients to stop i/o * limits the number of bucket index entries per osd op to 'osd_max_omap_entries_per_request' * prevents racing reset-stats commands from corrupting the stats # design the stats of each bucket index shard object are stored separately by cls_rgw in 'struct rgw_bucket_dir_header'. within each index shard, we also track stats per category in member variable 'std::map<RGWObjCategory, rgw_bucket_category_stats> stats'. these stats are updated by cls_rgw as bucket index transactions complete. the 'radosgw-admin bucket stats' command reads the stats from each index shard, and sums them up for display i propose a new 'bucket stats --reset-stats' command that makes consecutive calls to a new cls_rgw_recalc_stats() op, to eventually list all of its bucket index entries, accumulate their stats in a temporary map, then commit those updated stats once the listing reaches the end to support other writes to the bucket index during this process, the temporary map of stats is stored inside 'struct rgw_bucket_dir_header' as 'std::map<RGWObjCategory, rgw_bucket_category_stats> recalc_stats', so that bucket index transactions are able to update both the 'stats' and 'recalc_stats'. these updates to 'recalc_stats' would be conditional on the current position of the 'recalc_marker' - if the index entry's key is less than 'recalc_marker', then cls_rgw_recalc_stats() already missed this entry and we need to account for it in 'recalc_stats'. otherwise, cls_rgw_recalc_stats() will see this entry later in its listing and account for it then the new cls_rgw operation 'cls_rgw_recalc_stats()' implements the logic for a single osd op. this takes as input the marker position to resume its listing, and returns this updated marker position as output (relying on LIBRADOS_OPERATION_RETURNVEC since this is a write operation). the op itself just lists ~1000 omap keys, accumulates their stats in 'recalc_stats', then writes the updated 'recalc_stats' and 'recalc_marker' position to 'struct rgw_bucket_dir_header'. once cls_rgw_recalc_stats() reaches the end of the listing, it can overwrite 'stats' with 'recalc_stats', and clear 'recalc_stats'/'recalc_marker' to handle racing invocations of the 'bucket stats --reset-stats' command, cls_rgw_recalc_stats() requests with an empty marker will always succeed and start with a fresh listing. but when resuming with a non-empty marker, cls_rgw_recalc_stats() will compare that marker against the stored 'recalc_marker', and return -ECANCELED if they don't match to indicate a racing write. the end result is that new invocations of 'bucket stats --reset-stats' will cancel any previous invocations _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx