Hi Gabryel,
Are the pools always using 1X replication? The rados results are
scaling like it's using 1X but the CephFS results definitely look
suspect. Have you tried turning up the iodepth in addition to tuning
numjobs? Also is this kernel cephfs or fuse? The fuse client is far
slower. FWIW, on our test cluster with NVMe drives I can get about
60-65GB/s for large sequential writes across 80 OSDs (using 100 client
processes with kernel cephfs). It's definitely possible to scale better
than what you are seeing here.
https://docs.google.com/spreadsheets/d/1SpwEk3vB9gWzoxvy-K0Ax4NKbRJwd7W1ip-W-qitLlw/edit?usp=sharing
Mark
On 3/30/20 8:56 AM, Gabryel Mason-Williams wrote:
We have been benchmarking CephFS and comparing it Rados to see the performance difference and how much overhead CephFS has. However, we are getting odd results when using more than 1 OSD server (each OSDS has only one disk) using CephFS but using Rados everything appears normal. These tests are run on the same Ceph Cluster.
CephFS Rados
OSDS Thread 16 Thread 16
1 289 316
2 139 546
3 143 728
4 142 844
CephFS is being benchmarked using: fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=4M --numjobs=16 --size=1G --group_reporting
Rados is being benchmarked using: rados bench -p cephfs_data 10 write -t 16
If you could provide some help or insight into why this is happening or how to stop it, that would be much appreciated.
Kind regards,
Gabryel
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx