RGW - Multisite setup -> question about Bucket - Sharding, limitations and synchronization

Mainor Daly <ceph@xxxxxxxxxxxxxxxxxxxxxxxxx> · Tue, 30 Jul 2019 13:49:01 +0200 (CEST)

Hello,

(everything in context of S3)

I'm currently trying to better understand bucket sharding in combination with an multisite - rgw setup and possible limitations.

At the moment I understand that a bucket has a bucket index, which is a list of objects within the bucket.

There are also indexless buckets, but those are not usable for cases like a multisite - rgw bucket, where your need a [delayed] consistent relation/state between bucket n [zone a] and bucket n [zone b].

Those bucket indexes are stored in "shards" and shards get distributed over to whole zone - cluster for scaling purposes.
Redhat recommends a maximum size of 102,400 object per shard and recommend this forumular to determine the right shard size for a cluster:

number of objects expected in a bucket / 100,000 
max number of supported shards (or tested limit) is 7877 shard.

That results in a total limit of 787.700.000 objects, as long you wanna stay in known and tested water.

Now some the things I did not 100% understand:

= QUESTION 1 =

Does each bucket has it's own shards? E.g

Bucket 1 reached it's shard limit at 7877 shard, can i then create other  Buckets wish start with their own frish sets of shards?
OR is it the other way around which would mean all buckets save their Index in the the same shards and if i reach the shard limit I need to create a second cluster?

= QUESTION 2 =
How are this shards distrbuted over the cluster? I expect they are just objects in the rgw.bucket.index pool, is that correct?
So. those one:
rados ls -p a.rgw.buckets.index 
.dir.3638e3a4-8dde-42ee-812a-f98e266548a4.274451.1
.dir.3638e3a4-8dde-42ee-812a-f98e266548a4.87683.1
.dir.3638e3a4-8dde-42ee-812a-f98e266548a4.64716.1
.dir.3638e3a4-8dde-42ee-812a-f98e266548a4.78046.2

= QUESTION 3 = 

Does this Bucket Index Shards, has any relation to the RGW Sync shards in a rgw multisite setup?
E.g. If I have a ton of bucket index shards or buckets, does it have any impact on the sync shards? 

radosgw-admin sync status
 realm f0019e09-c830-4fe8-a992-435e6f463b7c (mumu_1)
 zonegroup 307a1bb5-4d93-4a01-af21-0d8467b9bdfe (EU_1)
 zone 5a9c4d16-27a6-4721-aeda-b1a539b3d73a (b)
 metadata sync syncing
 full sync: 0/64 shards                    <= this ones I mean
 incremental sync: 64/64 shards
 metadata is caught up with master
 data sync source: 3638e3a4-8dde-42ee-812a-f98e266548a4 (a)
 syncing
 full sync: 0/128 shards   <= and this ones
 incremental sync: 128/128 shards <= and this ones
 data is caught up with source

(swi to sync shard related topics)
= QUESTION 4 = 
(switching to sync shard related topics)

What is the exact function and purpose of the sync shards? Do they implement any limit? E.g. maybe a maximum amount of objects entries that waits for synchronization to zone b.

= QUESTION 5 = 
Are those  Sync Shards processed parallel or sequentially? And where are those shards stored?

= QUESTION 6 = 
As far as I have experienced the sync process pretty much works like that:

1.) The client sends a object or a operation to a rados gateway A (RGW A)
2.) RGW A logs this operation into one of it's sync shards and execute the operation to it's local storage pool
3.) RGW B checks via get requests in a regular intervall if any new entries in the RGW A log appears 
4.) If a new entry exists RGW B it's execute the operation to it's local pool or pulls the new object from RGW A

Did I understand that correct? (For my roughly description of this functionality, I want to apologize at the developers which for sure invested much time and effort into design and building of that sync - process)

And If I understand it correct, how would look the exact strategy in a multisite - setup to resync e.g. a single bucket at which one zone got corrupted and must be get back into a synchronous state?

Hope thats the correct place to ask such questions.

Best Regards,
Daly
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com