Re: RGW returning HTTP 500 during resharding

"Anthony D'Atri" <aad@xxxxxxxxxxxxxx> · Sat, 28 Sep 2024 10:23:29 -0400

> 
> No retries.
> Is it expected that resharding can take so long?
> (in a setup with all NVMe drives)

Which drive SKU(s)?  How full are they?  Is their firmware up to date?  How many RGWs?  Have you tuned your server network stack? Disabled Nagle?   How many bucket OSDs? How many index OSDs? How many PGs in the bucket and index pools?  How many buckets?  Do you have like 200M objects per? Do you have the default max objects/shard setting? 

Tiny objects are the devil of many object systems.  I can think of cases where the above questions could affect this case.  I think you resharding in advance might help.

> And is it correct behavior that it returns HTTP response code 500, instead of something that could indicate it is a retry'able condition?
> 
> If I would add my own code that does retry for a very long time, is there any way I can detect the 500 is due to the resharding, instead of some other condition that do is fatal?
> Also, is there any more efficient way to get a large amount of objects into Ceph than individual PUTs?
> Yours sincerely,
> 
> Floris Bos
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx