Re: slow perfomance: sanity check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/06/2017 01:54 PM, Adam Carheden wrote:
60-80MBs/s for what sort of setup? Is that 1Gbe rather than 10Gbe?

60-80MB/s per disk, assuming fairly standard 7200RPM disks before any replication takes place and assuming journals are on SSDs with fast O_DSYNC write performance. Any network limitations may decrease that further. Basically the gist of it is that you take a fairly standard ~140-150MB/s per disk, assume you get half that due to metadata writes, flushes, inode seeks, etc.


I consistently get 80-90Mb/s bandwidth as measured by `rados bench -p
rbd 10 write` run from a ceph node on a cluster with:
* 3 nodes
* 4 OSD/node, 600GB 15kRPM SAS disks
* 1G disk controller cache write cache shared by all disks in each node
* No SSDs
* 2x1Gbe lacp bond for redundancy, no jumbo frames
* 512 PGs for a cluster of 12 OSDs
* All disks in one pool of size=3, min_size=2

IOzone run on a VM using an rbd as it's HD confirms that setup maxes out
at around just under 100 MB/s for best-case scenarios, so I assumed the
1Gb network was the bottleneck.

The network is a good guess. With 3 1GbE nodes and 3X replication you aren't going to do any better than ~110MB/s. You are a little below that but it's in the right ballpark.


I'm in the process of planning a hardware purchase for a larger cluster:
more nodes, more drives, SSD journals and 10Gbe. I'm asuming I'll get
better performance.

You should, but it can be tricky to balance out everything. Figure that 80MB/s per disk (with 7200rpm disks and SSD journals) is the typical upper limit of what to expect with filestore on XFS, and any potential additional bottlenecks may bring that down. Some folks have started playing with things like Intel's CAS software to potentially improve those numbers through SSD caching, but it's not a typical setup.


What's the upper bound on CEPH performance for large sequential writes
from a single-client with all the recommended bells and whistles (ssd
journal, 10Gbe)? I assume it depends on both the total number of OSDs
and possibly OSDs per node if one had enough to saturate the network,
correct?

Yep, and that's sort of tough to answer. The fastest single client performance I've seen was a little over 4GB/s doing 4MB writes to an RBD volume on 16 NVMe OSDs using 40GbE (ie maxing it out on the client). If I had enough switch ports to do bonded I could probably having gotten closer to 8GB/s since the cluster was capable of it.

Having said that, there's a *lot* of ways to hurt performance. Red Hat has a ref architecture team that tests various hardware that might be able to give you a better idea of what works well these days.

Mark
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux