Re: [RGW] Strange write performance issues

Mark Nelson <mnelson@xxxxxxxxxx> · Wed, 10 Jun 2020 09:43:29 -0500

You are likely seeing the effect of bucket index splitting. If you know 
in advance you are going to have 100M objects in a single bucket, you 
may want to consider pre-sharding the bucket.  By default the dynamic 
background resharding will happen periodically as the bucket grows and 
during the reshard events client write IO will be blocked.  We're 
looking for ways to improve that, but for now I suspect you'll see 
better write behavior if you just pre-shard the bucket.  You'll want to 
set the number of shards based on your number of objects per shard.  So 
if you have 100000 objects per shard and 100M objects, you'll want at 
least 1000 shards.  Be aware that more shards will slow down bucket 
listing times, but in Octopus we have a number of enhancements that will 
help mitigate the performance loss.  To pre-shard the bucket:

# radosgw-admin reshard add --bucket <bucket_name> --num-shards <new 
number of shards> For more info, see: 
https://docs.ceph.com/docs/master/radosgw/dynamicresharding/

Mark

On 6/10/20 9:05 AM, Alexandru Cucu wrote:
Hello Ceph users,

We've been doing some tests with Ceph RGW. Mostly wanted to see how
Ceph will do with a large number of objects in a single bucket.

For the test we had a cluster with 3 nodes, running collocated OSDs,
MON, MGR, and RGW.

     CPU: 2x Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz (48 threads in total)
     RAM: 128 GB
     Network: 4 x 10Gbps in a single LACP bond.
     OSD drives: 2 x 800GB NVMe Write Intensive
     Ceph version: Nautilus (14.2.9)

The data pool uses replica 3. Ceph has only default configuration values.

We ran a COSBench test with 200 threads, writing 100 Million objects
with a size of 4KB and noticed that performance started at ~500
ops/sec, then, multiple times, jumped up to more than 7000 ops/sec and
back down to less than 500 ops/sec.
https://i.imgur.com/TopM6sw.png
https://i.imgur.com/y5Mu9F3.png

All the collected data from the COSBench test:
https://docs.google.com/spreadsheets/d/1wAwrg9nE2e_MItQB5wVrmLIO-KH7hkUBtz06YpFQtXA/edit?usp=sharing

We have noticed that during the low-performance time, the cluster is
doing read IO and the response time is very high:
https://i.imgur.com/tgZ5WLF.png
https://i.imgur.com/2PiEGZB.png

Here are the IO stats for the index and data pools:
https://i.imgur.com/hC3HZ1R.png
https://i.imgur.com/TwsXghv.png

We did the test multiple times on clean clusters, with similar results.
We also ran a second test, writing 50M new objects to the same bucket
we have previously filled with 100M objects and everything seems to be
working perfectly.

Does anyone know why this is happening?
The response times are huge and would be a disaster in a production environment!

Thank you,
---
Alex Cucu
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx