On 19/03/2019 16:17, jesper@xxxxxxxx wrote:
Hi All.
I'm trying to get head and tails into where we can stretch our Ceph cluster
into what applications. Parallism works excellent, but baseline throughput
it - perhaps - not what I would expect it to be.
Luminous cluster running bluestore - all OSD-daemons have 16GB of cache.
Fio files attacher - 4KB random read and 4KB random write - test file is
"only" 1GB
In this i ONLY care about raw IOPS numbers.
I have 2 pools, both 3x replicated .. one backed with SSDs S4510's
(14x1TB) and one with HDD's 84x10TB.
Network latency from rbd mount to one of the osd-hosts.
--- ceph-osd01.nzcorp.net ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9189ms
rtt min/avg/max/mdev = 0.084/0.108/0.146/0.022 ms
SSD:
randr:
# grep iops read*json | grep -v 0.00 | perl -ane'print $F[-1] . "\n"' |
cut -d\, -f1 | ministat -n
x <stdin>
N Min Max Median Avg Stddev
x 38 1727.07 2033.66 1954.71 1949.4789 46.592401
randw:
# grep iops write*json | grep -v 0.00 | perl -ane'print $F[-1] . "\n"' |
cut -d\, -f1 | ministat -n
x <stdin>
N Min Max Median Avg Stddev
x 36 400.05 455.26 436.58 433.91417 12.468187
The double (or triple) network penalty of-course kicks in and delivers a
lower throughput here.
Are these performance numbers in the ballpark of what we'd expect?
With 1GB of test file .. I would really expect this to be memory cached in
the OSD/bluestore cache
and thus deliver a read IOPS closer to theoretical max: 1s/0.108ms => 9.2K
IOPS
Again on the write side - all OSDs are backed by Battery-Backed write
cache, thus writes should go directly
into memory of the constroller .. .. still slower than reads - due to
having to visit 3 hosts.. but not this low?
Suggestions for improvements? Are other people seeing similar results?
For the HDD tests I get similar - surprisingly slow numbers:
# grep iops write*json | grep -v 0.00 | perl -ane'print $F[-1] . "\n"' |
cut -d\, -f1 | ministat -n
x <stdin>
N Min Max Median Avg Stddev
x 38 36.91 118.8 69.14 72.926842 21.75198
This should have the same performance characteristics as the SSD's as the
writes should be hitting BBWC.
# grep iops read*json | grep -v 0.00 | perl -ane'print $F[-1] . "\n"' |
cut -d\, -f1 | ministat -n
x <stdin>
N Min Max Median Avg Stddev
x 39 26.18 181.51 48.16 50.574872 24.01572
Same here - shold be cached in the blue-store cache as it is 16GB x 84
OSD's .. with a 1GB testfile.
Any thoughts - suggestions - insights ?
Jesper
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Cannot comment on the cache issue, hope someone will.
the ssd read latency of 0.5 ms, write latency of 2 ms are in the
ballpark. with Bluestore it is difficult to get below 1 ms for write.
As suggested make sure your cpu has at most 1 c-state and p-state min
freq is 100%. Also a cpu with higher GHz wouldg give better 1 qd /
latency value than a cpu with high cores but less GHz
Maged
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com