inline On Thu, Mar 12, 2020 at 3:16 PM Casey Bodley <cbodley@xxxxxxxxxx> wrote: > > One option would be to change the mapping in choose_oid() so that it > doesn't spread bucket shards. But doing this in a way that's > backward-compatible with existing datalogs would be complicated and > probably require a lot of extra io (for example, writing all datalog > entries on the 'wrong' shard to the error repo of the new shard so they > get retried there). This spreading is an important scaling property, > allowing gateways to share and parallelize a sync workload that's > clustered around a small number of buckets. I agree. We know this to be important. > > So I think the better option is to minimize this bucket-wide > coordination, and provide the locking/atomicity necessary to do it > across datalog shards. The common case (incremental sync of a bucket > shard) should require no synchronization, so we should continue to track > that incremental sync status in per-bucket-shard objects instead of > combining them into a single per-bucket object. > > We do still need some per-bucket status to track the full sync progress > and the current incremental sync generation. > > If the bucket status is in full sync, we can use a cls_lock on that > bucket status object to allow one shard to run the full sync, while > shards that fail to get the lock will go to the error repo for retry. > Once full sync completes, shards can go directly to incremental sync > without a lock. Yes, this seems reasonable. > > To coordinate the transitions across generations (reshard events), we > need some new logic when incremental sync on one shard reaches the end > of its log. Instead of cls_lock here, we can use cls_version to do a > read-modify-write of the bucket status object. This will either add our > shard to the set of completed shards for the current generation, or (if > all other shards completed already) increment that generation number so > we can start processing incremental sync on shards from that generation. > > This also modifies the handling of datalog entries for generations that > are newer than the current generation in the bucket status. In the > previous design draft, we would process these inline by running > incremental sync on all shards of the current generation - but this > would require coordination across datalog shards. So instead, each > unfinished shard in the bucket status can be written to the > corresponding datalog shard's error repo so it gets retried there. And > once these retries are successful, they will drive the transition to the > next generation and allow the original datalog entry to retry and make > progress. ok > > Does this sound reasonable? I'll start preparing an update to the design > doc. Sounds promising to me. Matt > > On 3/11/20 11:53 AM, Casey Bodley wrote: > > Well shoot. It turns out that some critical pieces of this design were > > based on a flawed assumption about the datalog; namely that all > > changes for a particular bucket would map to the same datalog shard. I > > now see that writes to different shards of a bucket are mapped to > > different shards in RGWDataChangesLog::choose_oid(). > > > > We use cls_locks on these datalog shards so that several gateways can > > sync the shards independently. But this isn't compatible with several > > elements of this resharding design that require bucket-wide coordination: > > > > * changing bucket full sync from per-bucket-shard to per-bucket > > > > * moving bucket sync status from per-bucket-shard to per-bucket > > > > * waiting for incremental sync of each bucket shard to complete before > > advancing to the next log generation > > > > > > On 2/25/20 2:53 PM, Casey Bodley wrote: > >> I opened a pull request https://github.com/ceph/ceph/pull/33539 with > >> a design doc for dynamic resharding in multisite. Please review and > >> give feedback, either here or in PR comments! > >> > _______________________________________________ > Dev mailing list -- dev@xxxxxxx > To unsubscribe send an email to dev-leave@xxxxxxx > -- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-821-5101 fax. 734-769-8938 cel. 734-216-5309 _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx