Re: Performance issues RGW (S3)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



There you go.

Tiny objects are the hardest thing for any object storage service:  you can have space amplification and metadata operations become a very high portion of the overall workload.

With 500KB objects, you may waste a significant fraction of underlying space -- especially if you have large-IU QLC OSDs, or OSDs made with an older Ceph release where the min_alloc_size was 64KB vs the current 4KB.  This is exacerbated by EC if you're using it, as many do for buckets pools.

https://docs.google.com/spreadsheets/d/1rpGfScgG-GLoIGMJWDixEkqs-On9w8nAUToPQjN8bDI/edit?gid=358760253#gid=358760253;
Bluestore Space Amplification Cheat Sheet
docs.google.com


Things to do:  Disable Nagle  https://docs.ceph.com/en/quincy/radosgw/frontends/

Putting your index pool on as many SSDs as you can would also help, I don't recall if it's on HDD now.   Index doesn't use all that much data, but benefits from a generous pg_num and multiple OSDs so that it isn't bottlenecked.


> On Jun 13, 2024, at 15:13, Sinan Polat <sinan@xxxxxxxx> wrote:
> 
> 500K object size
> 
>> Op 13 jun 2024 om 21:11 heeft Anthony D'Atri <aad@xxxxxxxxxxxxxx> het volgende geschreven:
>> 
>> How large are the objects you tested with?  
>> 
>>> On Jun 13, 2024, at 14:46, sinan@xxxxxxxx wrote:
>>> 
>>> I have doing some further testing.
>>> 
>>> My RGW pool is placed on spinning disks.
>>> I created a 2nd RGW data pool, placed on flash disks.
>>> 
>>> Benchmarking on HDD pool:
>>> Client 1 -> 1 RGW Node: 150 obj/s
>>> Client 1-5 -> 1 RGW Node: 150 ob/s (30 obj/s each client)
>>> Client 1 -> HAProxy -> 3 RGW Nodes: 150 obj/s
>>> Client 1-5 -> HAProxy -> 3 RGW Nodes: 150 obj/s (30 obj/s each client)
>>> 
>>> I did the same tests towards the RGW pool on flash disks: same results
>>> 
>>> So, it doesn't matter if my pool is hosted on HDD or SSD.
>>> It doesn't matter if I am using 1 RGW or 3 RGW nodes.
>>> It doesn't matter if I am using 1 client or 5 clients.
>>> 
>>> I am constantly limited at around 140-160 objects/s.
>>> 
>>> I see some TCP Retransmissions on the RGW Node, but maybe thats 'normal'.
>>> 
>>> Any ideas/suggestions?
>>> 
>>> On 2024-06-11 22:08, Anthony D'Atri wrote:
>>>>> I am not sure adding more RGW's will increase the performance.
>>>> That was a tangent.
>>>>> To be clear, that means whatever.rgw.buckets.index ?
>>>>>>> No, sorry my bad. .index is 32 and .data is 256.
>>>>>> Oh, yeah. Does `ceph osd df` show you at the far right like 4-5 PG replicas on each OSD?  You want (IMHO) to end up with 100-200, keeping each pool's pg_num to a power of 2 ideally.
>>>>> No, my RBD pool is larger. My average PG per OSD is round 60-70.
>>>> Ah.  Aim for 100-200 with spinners.
>>>>>> Assuming all your pools span all OSDs, I suggest at a minimum 256 for .index and 8192 for .data, assuming you have only RGW pools.  And would be included to try 512 / 8192.  Assuming your  other minor pools are at 32, I'd bump .log and .non-ec to 128 or 256 as well.
>>>>>> If you have RBD or other pools colocated, those numbers would change.
>>>>>> ^ above assume disabling the autoscaler
>>>>> I bumped my .data pool from 256 to 1024 and .index from 32 to 128.
>>>> Your index pool still only benefits from half of your OSDs with a value of 128.
>>>>> Also doubled the .non-e and .log pools. Performance wise I don't see any improvement. If I would see 10-20% improvement, I definitely would increase it to 512 / 8192.
>>>>> With 0.5MB object size I am still limited at about 150 up to 250 objects/s.
>>>>> The disks aren't saturated. The wr await is mostly around 1ms and does not get higher when benchmarking with S3.
>>>> Trust iostat about as far as you can throw it.
>>>>> Other suggestions, or does anyone else has suggestions?
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> 
> 
> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux