Hi All. I'm trying to get head and tails into where we can stretch our Ceph cluster into what applications. Parallism works excellent, but baseline throughput it - perhaps - not what I would expect it to be. Luminous cluster running bluestore - all OSD-daemons have 16GB of cache. Fio files attacher - 4KB random read and 4KB random write - test file is "only" 1GB In this i ONLY care about raw IOPS numbers. I have 2 pools, both 3x replicated .. one backed with SSDs S4510's (14x1TB) and one with HDD's 84x10TB. Network latency from rbd mount to one of the osd-hosts. --- ceph-osd01.nzcorp.net ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 9189ms rtt min/avg/max/mdev = 0.084/0.108/0.146/0.022 ms SSD: randr: # grep iops read*json | grep -v 0.00 | perl -ane'print $F[-1] . "\n"' | cut -d\, -f1 | ministat -n x <stdin> N Min Max Median Avg Stddev x 38 1727.07 2033.66 1954.71 1949.4789 46.592401 randw: # grep iops write*json | grep -v 0.00 | perl -ane'print $F[-1] . "\n"' | cut -d\, -f1 | ministat -n x <stdin> N Min Max Median Avg Stddev x 36 400.05 455.26 436.58 433.91417 12.468187 The double (or triple) network penalty of-course kicks in and delivers a lower throughput here. Are these performance numbers in the ballpark of what we'd expect? With 1GB of test file .. I would really expect this to be memory cached in the OSD/bluestore cache and thus deliver a read IOPS closer to theoretical max: 1s/0.108ms => 9.2K IOPS Again on the write side - all OSDs are backed by Battery-Backed write cache, thus writes should go directly into memory of the constroller .. .. still slower than reads - due to having to visit 3 hosts.. but not this low? Suggestions for improvements? Are other people seeing similar results? For the HDD tests I get similar - surprisingly slow numbers: # grep iops write*json | grep -v 0.00 | perl -ane'print $F[-1] . "\n"' | cut -d\, -f1 | ministat -n x <stdin> N Min Max Median Avg Stddev x 38 36.91 118.8 69.14 72.926842 21.75198 This should have the same performance characteristics as the SSD's as the writes should be hitting BBWC. # grep iops read*json | grep -v 0.00 | perl -ane'print $F[-1] . "\n"' | cut -d\, -f1 | ministat -n x <stdin> N Min Max Median Avg Stddev x 39 26.18 181.51 48.16 50.574872 24.01572 Same here - shold be cached in the blue-store cache as it is 16GB x 84 OSD's .. with a 1GB testfile. Any thoughts - suggestions - insights ? Jesper
Attachment:
fio-single-thread-randr.ini
Description: Binary data
Attachment:
fio-single-thread-randw.ini
Description: Binary data
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com