MultiSite Sync Problem and Shard Number Relation?

George Yil <georgeyil75@xxxxxxxxx> · Tue, 9 Feb 2021 16:41:48 +0300

Hi,

I am sort of newby to RGW multisite. I guess there is an important limitation about bucket index sharding if you run multisite. I would like to learn better or correct myself. And I also want to leave a bookmark here for future cephers if possible. I apologize if this is asked before however I am not able to find a good explanation from the mailing list.

AFAIK RGW needs bucket indexes as to serve bucket listing. This is optional but mostly demanded.

Now you have to set a value for the number of shards of the bucket. This is not dynamic since multisite does not support dynamic resharding yet.
AFAIK if your bucket hits the shard limit then your bucket sync is in trouble or maybe stopped. And you should plan a cluster outage in order to  reshard the bucket and resync from start. 

I would like to verify if this is a correct assumption?

Here is the problem I've encounter right now.

I have 2 nautilus 14.2.9 RGW multisite clusters. Since I was not fully aware of multisite/bucket limitations, I find myself with a fairly large bucket with 256 millions of objects. 
Now my problem is that bucket syncing seems to be stopped or stalled. I am %100 certain of which but I failed to figure out the exact problem from "radosgw-admin bucket sync status". It is quite possible that I maybe still don't know the proper tool to figure out the root cause.

>From the RGW logs I could capture this: "check_bucket_shards: resharding needed: stats.num_objects=256166901 shard max_objects=75000000". Cluster ceph.conf includes "rgw override bucket index max shards = 750".

I suppose I need to reshard this bucket to continue syncing. And the only way to reshard is that I have to stop all RGW services of both clusters, delete the secondary zone, reshard the bucket and resync the bucket from the beginning. Or for a better solution I should divide this large bucket into smaller buckets. This might have no easy way but to migrate with some kind of S3 sync tool (rather a fast one!).

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx