Re: CephFS performance vs. underlying storage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




I was wondering the same, from a 'default' setup I get this performance, no idea if this is bad, good or normal.

4k r ran.

4k w ran.

4k r seq.

4k w seq.

1024k r ran.

1024k w ran.

1024k r seq.

1024k w seq.

size

lat

iops

kB/s

lat

iops

kB/s

lat

iops

MB/s

lat

iops

MB/s

lat

iops

MB/s

lat

iops

MB/s

lat

iops

MB/s

lat

iops

MB/s

Cephfs

ssd rep. 3

2.78

1781

7297

1.42

700

2871

0.29

3314

13.6

0.04

889

3.64

4.3

231

243

0.08

132

139

4.23

235

247

6.99

142

150

Cephfs

ssd rep. 1

0.54

1809

7412

0.8

1238

5071

0.29

3325

13.6

0.56

1761

7.21

4.27

233

245

4.34

229

241

4.21

236

248

4.34

229

241

Samsung

MZK7KM480

480GB

0.09

10.2k

41600

0.05

17.9k

73200

0.05

18k

77.6

0.05

18.3k

75.1

2.06

482

506

2.16

460

483

1.98

502

527

2.13

466

489


(4 nodes, CentOS7, luminous)

Ps. not sure why you test with one node. If you expand to a 2nd node, you might get a unpleasant surprise with a drop in performance, because you will be adding network latency that decreases your iops.



-----Original Message-----
From: Hector Martin [mailto:hector@xxxxxxxxxxxxxx]
Sent: 30 January 2019 19:43
To: ceph-users@xxxxxxxxxxxxxx
Subject: CephFS performance vs. underlying storage

Hi list,

I'm experimentally running single-host CephFS as as replacement for
"traditional" filesystems.

My setup is 8×8TB HDDs using dm-crypt, with CephFS on a 5+2 EC pool. All
of the components are running on the same host (mon/osd/mds/kernel
CephFS client). I've set the stripe_unit/object_size to a relatively
high 80MB (up from the default 4MB). I figure I want individual reads on
the disks to be several megabytes per object for good sequential
performance, and since this is an EC pool 4MB objects would be split
into 800kB chunks, which is clearly not ideal. With 80MB objects, chunks
are 16MB, which sounds more like a healthy read size for sequential
access (e.g. something like 10 IOPS per disk during seq reads).

With this config, I get about 270MB/s sequential from CephFS. On the
same disks, an ext4 on dm-crypt on dm-raid6 yields ~680MB/s. So it seems
Ceph achieves less than half of the raw performance that the underlying
storage is capable of (with similar RAID redundancy). *

Obviously there will be some overhead with a stack as deep as Ceph
compared to more traditional setups, but I'm wondering if there are
improvements to be had here. While reading from CephFS I do not have
significant CPU usage, so I don't think I'm CPU limited. Could the issue
perhaps be latency through the stack / lack of read-ahead? Reading two
files in parallel doesn't really get me more than 300MB/s in total, so
parallelism doesn't seem to help much.

I'm curious as to whether there are any knobs I can play with to try to
improve performance, or whether this level of overhead is pretty much
inherent to Ceph. Even though this is an unusual single-host setup, I
imagine proper clusters might also have similar results when comparing
raw storage performance.

* Ceph has a slight disadvantage here because its chunk of the drives is
logically after the traditional RAID, and HDDs get slower towards higher
logical addresses, but this should be on the order of a 15-20% hit at most.

--
Hector Martin (hector@xxxxxxxxxxxxxx)
Public Key: https://mrcn.st/pub
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux