Hello Ceph users, We've been doing some tests with Ceph RGW. Mostly wanted to see how Ceph will do with a large number of objects in a single bucket. For the test we had a cluster with 3 nodes, running collocated OSDs, MON, MGR, and RGW. CPU: 2x Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz (48 threads in total) RAM: 128 GB Network: 4 x 10Gbps in a single LACP bond. OSD drives: 2 x 800GB NVMe Write Intensive Ceph version: Nautilus (14.2.9) The data pool uses replica 3. Ceph has only default configuration values. We ran a COSBench test with 200 threads, writing 100 Million objects with a size of 4KB and noticed that performance started at ~500 ops/sec, then, multiple times, jumped up to more than 7000 ops/sec and back down to less than 500 ops/sec. https://i.imgur.com/TopM6sw.png https://i.imgur.com/y5Mu9F3.png All the collected data from the COSBench test: https://docs.google.com/spreadsheets/d/1wAwrg9nE2e_MItQB5wVrmLIO-KH7hkUBtz06YpFQtXA/edit?usp=sharing We have noticed that during the low-performance time, the cluster is doing read IO and the response time is very high: https://i.imgur.com/tgZ5WLF.png https://i.imgur.com/2PiEGZB.png Here are the IO stats for the index and data pools: https://i.imgur.com/hC3HZ1R.png https://i.imgur.com/TwsXghv.png We did the test multiple times on clean clusters, with similar results. We also ran a second test, writing 50M new objects to the same bucket we have previously filled with 100M objects and everything seems to be working perfectly. Does anyone know why this is happening? The response times are huge and would be a disaster in a production environment! Thank you, --- Alex Cucu _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx