Re: Performance issues RGW (S3)

sinan@xxxxxxxx · Thu, 13 Jun 2024 22:11:27 +0200

Disabling Nagle didn't have any effect.
I created a new RGW pool (data, index), both on flash disks. No effect.
I set the size=2, no effect.

Btw, cluster is running on Octopus (15.2).

When using 3 MB/s objects, I am still getting 150 objects/s. Just a 
higher throughput (150x3MB = 450MB/s). But the objects/s doesn't 
increase. Its like, a Ceph configuration is limiting it or something.

On 2024-06-13 21:37, Anthony D'Atri wrote:
There you go.

Tiny objects are the hardest thing for any object storage service:
you can have space amplification and metadata operations become a very
high portion of the overall workload.

With 500KB objects, you may waste a significant fraction of underlying
space -- especially if you have large-IU QLC OSDs, or OSDs made with
an older Ceph release where the min_alloc_size was 64KB vs the current
4KB.  This is exacerbated by EC if you're using it, as many do for
buckets pools.

Bluestore Space Amplification Cheat Sheet [1]
docs.google.com [1]

Things to do:  Disable Nagle
https://docs.ceph.com/en/quincy/radosgw/frontends/

Putting your index pool on as many SSDs as you can would also help, I
don't recall if it's on HDD now.   Index doesn't use all that much
data, but benefits from a generous pg_num and multiple OSDs so that it
isn't bottlenecked.

On Jun 13, 2024, at 15:13, Sinan Polat <sinan@xxxxxxxx> wrote:

500K object size

Op 13 jun 2024 om 21:11 heeft Anthony D'Atri <aad@xxxxxxxxxxxxxx>
het volgende geschreven:

How large are the objects you tested with?

On Jun 13, 2024, at 14:46, sinan@xxxxxxxx wrote:

I have doing some further testing.

My RGW pool is placed on spinning disks.
I created a 2nd RGW data pool, placed on flash disks.

Benchmarking on HDD pool:
Client 1 -> 1 RGW Node: 150 obj/s
Client 1-5 -> 1 RGW Node: 150 ob/s (30 obj/s each client)
Client 1 -> HAProxy -> 3 RGW Nodes: 150 obj/s
Client 1-5 -> HAProxy -> 3 RGW Nodes: 150 obj/s (30 obj/s each
client)

I did the same tests towards the RGW pool on flash disks: same
results

So, it doesn't matter if my pool is hosted on HDD or SSD.
It doesn't matter if I am using 1 RGW or 3 RGW nodes.
It doesn't matter if I am using 1 client or 5 clients.

I am constantly limited at around 140-160 objects/s.

I see some TCP Retransmissions on the RGW Node, but maybe thats
'normal'.

Any ideas/suggestions?

On 2024-06-11 22:08, Anthony D'Atri wrote:
I am not sure adding more RGW's will increase the performance.
That was a tangent.
To be clear, that means whatever.rgw.buckets.index ?
No, sorry my bad. .index is 32 and .data is 256.
Oh, yeah. Does `ceph osd df` show you at the far right like 4-5 PG
replicas on each OSD?  You want (IMHO) to end up with 100-200,
keeping each pool's pg_num to a power of 2 ideally.
 No, my RBD pool is larger. My average PG per OSD is round 60-70.
Ah.  Aim for 100-200 with spinners.

Assuming all your pools span all OSDs, I suggest at a minimum 256
for .index and 8192 for .data, assuming you have only RGW pools.
And would be included to try 512 / 8192.  Assuming your  other
minor pools are at 32, I'd bump .log and .non-ec to 128 or 256 as
well.
If you have RBD or other pools colocated, those numbers would
change.
^ above assume disabling the autoscaler
I bumped my .data pool from 256 to 1024 and .index from 32 to 128.
 Your index pool still only benefits from half of your OSDs with a
value of 128.

Also doubled the .non-e and .log pools. Performance wise I don't see
any improvement. If I would see 10-20% improvement, I definitely
would increase it to 512 / 8192.
With 0.5MB object size I am still limited at about 150 up to 250
objects/s.
The disks aren't saturated. The wr await is mostly around 1ms and
does not get higher when benchmarking with S3.
 Trust iostat about as far as you can throw it.

Other suggestions, or does anyone else has suggestions?
 _______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

Links:
------
[1] 
https://docs.google.com/spreadsheets/d/1rpGfScgG-GLoIGMJWDixEkqs-On9w8nAUToPQjN8bDI/edit?gid=358760253#gid=358760253
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx