You are likely seeing the effect of bucket index splitting. If you know
in advance you are going to have 100M objects in a single bucket, you
may want to consider pre-sharding the bucket. By default the dynamic
background resharding will happen periodically as the bucket grows and
during the reshard events client write IO will be blocked. We're
looking for ways to improve that, but for now I suspect you'll see
better write behavior if you just pre-shard the bucket. You'll want to
set the number of shards based on your number of objects per shard. So
if you have 100000 objects per shard and 100M objects, you'll want at
least 1000 shards. Be aware that more shards will slow down bucket
listing times, but in Octopus we have a number of enhancements that will
help mitigate the performance loss. To pre-shard the bucket:
# radosgw-admin reshard add --bucket <bucket_name> --num-shards <new
number of shards> For more info, see:
https://docs.ceph.com/docs/master/radosgw/dynamicresharding/
Mark
On 6/10/20 9:05 AM, Alexandru Cucu wrote:
Hello Ceph users,
We've been doing some tests with Ceph RGW. Mostly wanted to see how
Ceph will do with a large number of objects in a single bucket.
For the test we had a cluster with 3 nodes, running collocated OSDs,
MON, MGR, and RGW.
CPU: 2x Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz (48 threads in total)
RAM: 128 GB
Network: 4 x 10Gbps in a single LACP bond.
OSD drives: 2 x 800GB NVMe Write Intensive
Ceph version: Nautilus (14.2.9)
The data pool uses replica 3. Ceph has only default configuration values.
We ran a COSBench test with 200 threads, writing 100 Million objects
with a size of 4KB and noticed that performance started at ~500
ops/sec, then, multiple times, jumped up to more than 7000 ops/sec and
back down to less than 500 ops/sec.
https://i.imgur.com/TopM6sw.png
https://i.imgur.com/y5Mu9F3.png
All the collected data from the COSBench test:
https://docs.google.com/spreadsheets/d/1wAwrg9nE2e_MItQB5wVrmLIO-KH7hkUBtz06YpFQtXA/edit?usp=sharing
We have noticed that during the low-performance time, the cluster is
doing read IO and the response time is very high:
https://i.imgur.com/tgZ5WLF.png
https://i.imgur.com/2PiEGZB.png
Here are the IO stats for the index and data pools:
https://i.imgur.com/hC3HZ1R.png
https://i.imgur.com/TwsXghv.png
We did the test multiple times on clean clusters, with similar results.
We also ran a second test, writing 50M new objects to the same bucket
we have previously filled with 100M objects and everything seems to be
working perfectly.
Does anyone know why this is happening?
The response times are huge and would be a disaster in a production environment!
Thank you,
---
Alex Cucu
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx