On Thu, Jul 15, 2021 at 2:25 PM star fan <jfanix@xxxxxxxxx> wrote: > > We found some unnormal status of sync status when running multiple > rgw(15.2.14) multisize sync, then I dig into the codes about rgw sync. > I think there is a issues of rgw sync concurrency implementation if I > understand correctly. > The implementation of the critical process which we want it run once, > which steps are as below: > 1. read shared status object > 2. check status > 3. lock status > 4. critical process > 5. store status > 6. unlock > > It is a problem in concurrent case that the critical process would > run multiple times because it uses old status, thus it makes no sense. > The steps should be as below > 1. read shared status object > 2. check status > 3. lock status > 4. read and check status again > 5. critical process > 6. store status > 7. unlock > > one example as below > do { > r = run(new RGWReadSyncStatusCoroutine(&sync_env, &sync_status)); > if (r < 0 && r != -ENOENT) { > tn->log(0, SSTR("ERROR: failed to fetch sync status r=" << r)); > return r; > } > > switch ((rgw_meta_sync_info::SyncState)sync_status.sync_info.state) { > case rgw_meta_sync_info::StateBuildingFullSyncMaps: > tn->log(20, "building full sync maps"); > r = run(new RGWFetchAllMetaCR(&sync_env, num_shards, > sync_status.sync_markers, tn)); > > And there is no deletion of omapkeys after finishing sync entry in > full_sync process, thus full_sync would run multiple times in > concurrent case. > > It has no importance impact on data sync because bucket syncing is > idempotence,but no metadata sync Moving to dev@xxxxxxx and adding Casey. Thanks, Ilya