One thing you can check is the CPU performance (cpu governor in particular).
On such light loads I've seen CPUs sitting in low performance mode (slower
clocks), giving MUCH worse performance results than when tried with heavier
loads. Try "cpupower monitor" on OSD nodes in a loop and observe the core
frequencies.
On 2019-03-19 3:17 p.m., jesper@xxxxxxxx wrote:
Hi All.
I'm trying to get head and tails into where we can stretch our Ceph cluster
into what applications. Parallism works excellent, but baseline throughput
it - perhaps - not what I would expect it to be.
Luminous cluster running bluestore - all OSD-daemons have 16GB of cache.
Fio files attacher - 4KB random read and 4KB random write - test file is
"only" 1GB
In this i ONLY care about raw IOPS numbers.
I have 2 pools, both 3x replicated .. one backed with SSDs S4510's
(14x1TB) and one with HDD's 84x10TB.
Network latency from rbd mount to one of the osd-hosts.
--- ceph-osd01.nzcorp.net ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9189ms
rtt min/avg/max/mdev = 0.084/0.108/0.146/0.022 ms
SSD:
randr:
# grep iops read*json | grep -v 0.00 | perl -ane'print $F[-1] . "\n"' |
cut -d\, -f1 | ministat -n
x <stdin>
N Min Max Median Avg Stddev
x 38 1727.07 2033.66 1954.71 1949.4789 46.592401
randw:
# grep iops write*json | grep -v 0.00 | perl -ane'print $F[-1] . "\n"' |
cut -d\, -f1 | ministat -n
x <stdin>
N Min Max Median Avg Stddev
x 36 400.05 455.26 436.58 433.91417 12.468187
The double (or triple) network penalty of-course kicks in and delivers a
lower throughput here.
Are these performance numbers in the ballpark of what we'd expect?
With 1GB of test file .. I would really expect this to be memory cached in
the OSD/bluestore cache
and thus deliver a read IOPS closer to theoretical max: 1s/0.108ms => 9.2K
IOPS
Again on the write side - all OSDs are backed by Battery-Backed write
cache, thus writes should go directly
into memory of the constroller .. .. still slower than reads - due to
having to visit 3 hosts.. but not this low?
Suggestions for improvements? Are other people seeing similar results?
For the HDD tests I get similar - surprisingly slow numbers:
# grep iops write*json | grep -v 0.00 | perl -ane'print $F[-1] . "\n"' |
cut -d\, -f1 | ministat -n
x <stdin>
N Min Max Median Avg Stddev
x 38 36.91 118.8 69.14 72.926842 21.75198
This should have the same performance characteristics as the SSD's as the
writes should be hitting BBWC.
# grep iops read*json | grep -v 0.00 | perl -ane'print $F[-1] . "\n"' |
cut -d\, -f1 | ministat -n
x <stdin>
N Min Max Median Avg Stddev
x 39 26.18 181.51 48.16 50.574872 24.01572
Same here - shold be cached in the blue-store cache as it is 16GB x 84
OSD's .. with a 1GB testfile.
Any thoughts - suggestions - insights ?
Jesper
--
Piotr Dałek
piotr.dalek@xxxxxxxxxxxx
https://www.ovhcloud.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com