CephFS performance vs. underlying storage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi list,

I'm experimentally running single-host CephFS as as replacement for
"traditional" filesystems.

My setup is 8×8TB HDDs using dm-crypt, with CephFS on a 5+2 EC pool. All
of the components are running on the same host (mon/osd/mds/kernel
CephFS client). I've set the stripe_unit/object_size to a relatively
high 80MB (up from the default 4MB). I figure I want individual reads on
the disks to be several megabytes per object for good sequential
performance, and since this is an EC pool 4MB objects would be split
into 800kB chunks, which is clearly not ideal. With 80MB objects, chunks
are 16MB, which sounds more like a healthy read size for sequential
access (e.g. something like 10 IOPS per disk during seq reads).

With this config, I get about 270MB/s sequential from CephFS. On the
same disks, an ext4 on dm-crypt on dm-raid6 yields ~680MB/s. So it seems
Ceph achieves less than half of the raw performance that the underlying
storage is capable of (with similar RAID redundancy). *

Obviously there will be some overhead with a stack as deep as Ceph
compared to more traditional setups, but I'm wondering if there are
improvements to be had here. While reading from CephFS I do not have
significant CPU usage, so I don't think I'm CPU limited. Could the issue
perhaps be latency through the stack / lack of read-ahead? Reading two
files in parallel doesn't really get me more than 300MB/s in total, so
parallelism doesn't seem to help much.

I'm curious as to whether there are any knobs I can play with to try to
improve performance, or whether this level of overhead is pretty much
inherent to Ceph. Even though this is an unusual single-host setup, I
imagine proper clusters might also have similar results when comparing
raw storage performance.

* Ceph has a slight disadvantage here because its chunk of the drives is
logically after the traditional RAID, and HDDs get slower towards higher
logical addresses, but this should be on the order of a 15-20% hit at most.

-- 
Hector Martin (hector@xxxxxxxxxxxxxx)
Public Key: https://mrcn.st/pub
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux