Re: Odd CephFS Performance

Mark Nelson <mnelson@xxxxxxxxxx> · Mon, 30 Mar 2020 12:27:03 -0500

Hi Gabryel,

Are the pools always using 1X replication?  The rados results are 
scaling like it's using 1X but the CephFS results definitely look 
suspect.  Have you tried turning up the iodepth in addition to tuning 
numjobs?  Also is this kernel cephfs or fuse?  The fuse client is far 
slower.  FWIW, on our test cluster with NVMe drives I can get about 
60-65GB/s for large sequential writes across 80 OSDs (using 100 client 
processes with kernel cephfs).  It's definitely possible to scale better 
than what you are seeing here.

https://docs.google.com/spreadsheets/d/1SpwEk3vB9gWzoxvy-K0Ax4NKbRJwd7W1ip-W-qitLlw/edit?usp=sharing

Mark

On 3/30/20 8:56 AM, Gabryel Mason-Williams wrote:
We have been benchmarking CephFS and comparing it Rados to see the performance difference and how much overhead CephFS has. However, we are getting odd results when using more than 1 OSD server (each OSDS has only one disk) using CephFS but using Rados everything appears normal. These tests are run on the same Ceph Cluster.

                 CephFS	        Rados
OSDS	Thread 16	Thread 16
1	        289                  316
2	        139	                 546
3	        143	                 728
4	        142	                 844

CephFS is being benchmarked using: fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=4M --numjobs=16  --size=1G  --group_reporting
Rados is being benchmarked using: rados bench -p cephfs_data 10 write -t 16

If you could provide some help or insight into why this is happening or how to stop it, that would be much appreciated.

Kind regards,

Gabryel
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx