Sync Info: Information about the sync entities (for example: meta, data, bucket). * Rationale To simplify the sync code and to separate the sync logic from the info layout. To provide a framework that could enable alternative sync entity providers. * Overview The rgw sync code handles sync of both metadata and buckets data. It pulls information about the sync entities from 3 separate types of providers: meta, data (which buckets need to sync), bucket. Sync for each of the types is split into two different stages: full sync and incremental sync. Full sync means that all the relevant data is being iterated over: listing of all meta keys, listing of all bucket instances, listing of all keys in every bucket. Incremental sync means that the different changes logs are being read and changes are being applied. Stemming from this is that we have 6 separate implementations for the core sync process that vary in the way sync information is fetched, but repeat the same general logic: fetch information, apply changes, update markers, handle errors. Therefore there is an opportunity to consolidate the sync logic into common generic code. At the minimum we could combine the full sync and incremental stages of each entity type. Other considerations, like multi-stage bucket sync that supports resharding could benefit from that. * Details new module: sync info provider The sync info provider will be responsible for providing meta-information about the sync entity (e.g., how many shards in the different sync stages, current state for each stage), and will provide serial sync info by marker. The marker itself will be opaque to the target and will reflect the sync stage. The sync info provider will return the list of source entities that need to be synced, and the target should be able to apply those changes without specific knowledge of each different stage. The info provided will start with the 'full sync' data (e.g., for metadata sync it will include all the metadata keys), and when that is exhausted, it will include all the metadata log entries. The structure of each info entry will be the same whichever stage it is. Each stage could have different sharding properties. For example: the first stage (full sync) can have a single shard, and the second stage (incremental sync) can have many more shards. When starting the sync, the target will send init request that will return a list of initial markers for each shard in any of the existing stages. The shard id and the stage id information will be embedded within the marker. Trimming: We can leverage this system to simplify the trim logic. The sync info provider could keep a list of all the targets and their corresponding current synced marker position (that the targets will provide). The trimmers could then use that information instead of polling the targets for their current state. We can get rid of the trimmers polling scheme altogether and the sync info providers could maintain a central in-memory list of trimming targets that will periodically be trimmed. (Need to consider backward compatibility) The sync info providers should be running at the source zone and a RESTful api should be created to access their functionality. They should read in the source zone so that sync info about targets can be aggregated (for trimming purposes). However, we should create a target-side interface that would provide a functional interface for the sync code. We can create alternative target-side implementations for managing sources that do not support this API (e.g., backward compatibility and other non-rgw sources). A sync info provider client wrapper can be created to enable cases like full sync of metadata where the source info is not sharded. The generic wrapper would fetch the data and store it in temporary queues (the same as the current full metadata sync) so that the sync process itself could happen concurrently by multiple shards. It should be transparent to the caller, and from its point of view it would fetch the data from a sharded source. sync_info_init: input: { my_id entity_id } output: { sync_info_id stages[] = { stage_id num_shards markers[] = { string marker } } } sync_info_fetch: input: { sync_info_id marker max_entries optional: sync_position (for trimming) } output: { status = { have_more | done | stage_done | who_are_you } entries[] = { marker_id info (depending on entity type) } } * The entry info will depend on the entity type. It will usually include a timestamp and other type-specific fields. For example, the bucket entity type would include an op field that data entities would not have. * When the data of a specific sync stage is exhausted, the status will reflect it. * The source might decide to remove a target from its list of targets if a target hasn't contacted it for a while. This can happen if a target went down, and is needed to allow the source to trim its logs. If the target returns then the source will send an error message that will require the target to re-initialize its sync process. * if there is no more data in the current stage, the stage_done status will be returned. The target will only start working on the next stage after all the current stage shards are complete. When all shards are complete, the sync process should initiate a sync on all the new stage shards. If target does not have any information for the next stage (e.g., after bucket reshard), it will query the source for that information sync_info_update_position: input: { sync_info_id sync_position_marker } output: { status } sync_info_next_stage_info: input: { cur_stage_id } output: { status next_stage_id num_shards markers[] = { string marker } } Note that the returned markers will be the needed markers for transitioning to the next stage. The markers returned when transitioning from full sync to incremental sync reflect the max logs position when the sync started. The markers returned when transitioning from different incremental sync stages (e.g., different reshard generations) are the minimum log positions (or even empty position) of the next generation. * Development Initial work & Metadata Sync * Source-side sync info provider core * define functional interfaces * abstract SyncInfoProvider * abstract SIPEntity * Control * store/read target state * marker tools * First implementation * create provider for metadata * SyncInfoProvider_Meta * SIPEntity_Meta * radosgw-admin to control SyncInfoProvider hooks * REST api * Target-side core * define abstract SIPClient * implement SIPClient_REST * Coroutines implementation * radosgw-admin to control SIPClient hooks * Meta sync * modify sync init * convert incremental sync to use SIP * stage transition * remove full sync * Source side trimming module * generic core * implement mdlog trimmer TODO: - backward compatibility plan - trimming in mixed versions environment? - radosgw-admin sync status - testing Data & Bucket Sync * Data Sync * source side SyncInfoProvider_Data * convert sync code similar to meta sync * Bucket Sync * SyncInfoProvider_BucketInstance * bucket instance sync code * Optional * common sync core * modify all implementations to use common core Yehuda _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx