Re: CephFS performance vs. underlying storage

"Marc Roos" <M.Roos@xxxxxxxxxxxxxxxxx> · Wed, 30 Jan 2019 22:13:59 +0100

I was wondering the same, from a 'default' setup I get this performance, 
no idea if this is bad, good or normal.

      4k r ran. 

      4k w ran. 

      4k r seq. 

      4k w seq. 

      1024k r ran. 

      1024k w ran. 

      1024k r seq. 

      1024k w seq. 

      size 

      lat 

      iops 

      kB/s 

      lat 

      iops 

      kB/s 

      lat 

      iops 

      MB/s 

      lat 

      iops 

      MB/s 

      lat 

      iops 

      MB/s 

      lat 

      iops 

      MB/s 

      lat 

      iops 

      MB/s 

      lat 

      iops 

      MB/s 

      Cephfs 

      ssd rep. 3 

      2.78 

      1781 

      7297 

      1.42 

      700 

      2871 

      0.29 

      3314 

      13.6 

      0.04 

      889 

      3.64 

      4.3 

      231 

      243 

      0.08 

      132 

      139 

      4.23 

      235 

      247 

      6.99 

      142 

      150 

      Cephfs 

      ssd rep. 1 

      0.54 

      1809 

      7412 

      0.8 

      1238 

      5071 

      0.29 

      3325 

      13.6 

      0.56 

      1761 

      7.21 

      4.27 

      233 

      245 

      4.34 

      229 

      241 

      4.21 

      236 

      248 

      4.34 

      229 

      241 

      Samsung 

      MZK7KM480 

      480GB 

      0.09 

      10.2k 

      41600 

      0.05 

      17.9k 

      73200 

      0.05 

      18k 

      77.6 

      0.05 

      18.3k 

      75.1 

      2.06 

      482 

      506 

      2.16 

      460 

      483 

      1.98 

      502 

      527 

      2.13 

      466 

      489 

(4 nodes, CentOS7, 
luminous) 

Ps. not sure why you 
test with one node. If you expand to a 2nd node, you might get a unpleasant 
surprise with a drop in performance, because you will be adding 
network latency that decreases your 
iops.

-----Original Message-----

From: Hector Martin 
[mailto:hector@xxxxxxxxxxxxxx]

Sent: 
30 January 2019 19:43

To: ceph-users@xxxxxxxxxxxxxx

Subject:  
CephFS performance vs. underlying storage

Hi list,

I'm 
experimentally running single-host CephFS as as replacement for

"traditional" 
filesystems.

My setup is 8×8TB HDDs using dm-crypt, with CephFS on a 5+2 
EC pool. All

of the components are running on the same host 
(mon/osd/mds/kernel

CephFS client). I've set the stripe_unit/object_size to a 
relatively

high 80MB (up from the default 4MB). I figure I want individual 
reads on

the disks to be several megabytes per object for good 
sequential

performance, and since this is an EC pool 4MB objects would be 
split

into 800kB chunks, which is clearly not ideal. With 80MB objects, 
chunks

are 16MB, which sounds more like a healthy read size for 
sequential

access (e.g. something like 10 IOPS per disk during seq 
reads).

With this config, I get about 270MB/s sequential from CephFS. On 
the

same disks, an ext4 on dm-crypt on dm-raid6 yields ~680MB/s. So it 
seems

Ceph achieves less than half of the raw performance that the 
underlying

storage is capable of (with similar RAID redundancy). 
*

Obviously there will be some overhead with a stack as deep as 
Ceph

compared to more traditional setups, but I'm wondering if there 
are

improvements to be had here. While reading from CephFS I do not 
have

significant CPU usage, so I don't think I'm CPU limited. Could the 
issue

perhaps be latency through the stack / lack of read-ahead? Reading 
two

files in parallel doesn't really get me more than 300MB/s in total, 
so

parallelism doesn't seem to help much.

I'm curious as to whether 
there are any knobs I can play with to try to

improve performance, or whether 
this level of overhead is pretty much

inherent to Ceph. Even though this is 
an unusual single-host setup, I

imagine proper clusters might also have 
similar results when comparing

raw storage performance.

* Ceph has a 
slight disadvantage here because its chunk of the drives is

logically after 
the traditional RAID, and HDDs get slower towards higher

logical addresses, 
but this should be on the order of a 15-20% hit at most.

--

Hector 
Martin (hector@xxxxxxxxxxxxxx)

Public Key: https://mrcn.st/pub

_______________________________________________

ceph-users 
mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

			4k r ran.			4k w ran.			4k r seq.			4k w seq.			1024k r ran.			1024k w ran.			1024k r seq.			1024k w seq.
		size	lat	iops	kB/s	lat	iops	kB/s	lat	iops	MB/s	lat	iops	MB/s	lat	iops	MB/s	lat	iops	MB/s	lat	iops	MB/s	lat	iops	MB/s
Cephfs	ssd rep. 3		2.78	1781	7297	1.42	700	2871	0.29	3314	13.6	0.04	889	3.64	4.3	231	243	0.08	132	139	4.23	235	247	6.99	142	150
Cephfs	ssd rep. 1		0.54	1809	7412	0.8	1238	5071	0.29	3325	13.6	0.56	1761	7.21	4.27	233	245	4.34	229	241	4.21	236	248	4.34	229	241
Samsung	MZK7KM480	480GB	0.09	10.2k	41600	0.05	17.9k	73200	0.05	18k	77.6	0.05	18.3k	75.1	2.06	482	506	2.16	460	483	1.98	502	527	2.13	466	489