There you go. Tiny objects are the hardest thing for any object storage service: you can have space amplification and metadata operations become a very high portion of the overall workload. With 500KB objects, you may waste a significant fraction of underlying space -- especially if you have large-IU QLC OSDs, or OSDs made with an older Ceph release where the min_alloc_size was 64KB vs the current 4KB. This is exacerbated by EC if you're using it, as many do for buckets pools. https://docs.google.com/spreadsheets/d/1rpGfScgG-GLoIGMJWDixEkqs-On9w8nAUToPQjN8bDI/edit?gid=358760253#gid=358760253; Bluestore Space Amplification Cheat Sheet docs.google.com Things to do: Disable Nagle https://docs.ceph.com/en/quincy/radosgw/frontends/ Putting your index pool on as many SSDs as you can would also help, I don't recall if it's on HDD now. Index doesn't use all that much data, but benefits from a generous pg_num and multiple OSDs so that it isn't bottlenecked. > On Jun 13, 2024, at 15:13, Sinan Polat <sinan@xxxxxxxx> wrote: > > 500K object size > >> Op 13 jun 2024 om 21:11 heeft Anthony D'Atri <aad@xxxxxxxxxxxxxx> het volgende geschreven: >> >> How large are the objects you tested with? >> >>> On Jun 13, 2024, at 14:46, sinan@xxxxxxxx wrote: >>> >>> I have doing some further testing. >>> >>> My RGW pool is placed on spinning disks. >>> I created a 2nd RGW data pool, placed on flash disks. >>> >>> Benchmarking on HDD pool: >>> Client 1 -> 1 RGW Node: 150 obj/s >>> Client 1-5 -> 1 RGW Node: 150 ob/s (30 obj/s each client) >>> Client 1 -> HAProxy -> 3 RGW Nodes: 150 obj/s >>> Client 1-5 -> HAProxy -> 3 RGW Nodes: 150 obj/s (30 obj/s each client) >>> >>> I did the same tests towards the RGW pool on flash disks: same results >>> >>> So, it doesn't matter if my pool is hosted on HDD or SSD. >>> It doesn't matter if I am using 1 RGW or 3 RGW nodes. >>> It doesn't matter if I am using 1 client or 5 clients. >>> >>> I am constantly limited at around 140-160 objects/s. >>> >>> I see some TCP Retransmissions on the RGW Node, but maybe thats 'normal'. >>> >>> Any ideas/suggestions? >>> >>> On 2024-06-11 22:08, Anthony D'Atri wrote: >>>>> I am not sure adding more RGW's will increase the performance. >>>> That was a tangent. >>>>> To be clear, that means whatever.rgw.buckets.index ? >>>>>>> No, sorry my bad. .index is 32 and .data is 256. >>>>>> Oh, yeah. Does `ceph osd df` show you at the far right like 4-5 PG replicas on each OSD? You want (IMHO) to end up with 100-200, keeping each pool's pg_num to a power of 2 ideally. >>>>> No, my RBD pool is larger. My average PG per OSD is round 60-70. >>>> Ah. Aim for 100-200 with spinners. >>>>>> Assuming all your pools span all OSDs, I suggest at a minimum 256 for .index and 8192 for .data, assuming you have only RGW pools. And would be included to try 512 / 8192. Assuming your other minor pools are at 32, I'd bump .log and .non-ec to 128 or 256 as well. >>>>>> If you have RBD or other pools colocated, those numbers would change. >>>>>> ^ above assume disabling the autoscaler >>>>> I bumped my .data pool from 256 to 1024 and .index from 32 to 128. >>>> Your index pool still only benefits from half of your OSDs with a value of 128. >>>>> Also doubled the .non-e and .log pools. Performance wise I don't see any improvement. If I would see 10-20% improvement, I definitely would increase it to 512 / 8192. >>>>> With 0.5MB object size I am still limited at about 150 up to 250 objects/s. >>>>> The disks aren't saturated. The wr await is mostly around 1ms and does not get higher when benchmarking with S3. >>>> Trust iostat about as far as you can throw it. >>>>> Other suggestions, or does anyone else has suggestions? >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx