Re: fio test rbd - single thread - qd1

Piotr Dałek <piotr.dalek@xxxxxxxxxxxx> · Tue, 19 Mar 2019 15:47:58 +0100

One thing you can check is the CPU performance (cpu governor in particular). 
On such light loads I've seen CPUs sitting in low performance mode (slower 
clocks), giving MUCH worse performance results than when tried with heavier 
loads. Try "cpupower monitor" on OSD nodes in a loop and observe the core 
frequencies.

On 2019-03-19 3:17 p.m., jesper@xxxxxxxx wrote:
Hi All.

I'm trying to get head and tails into where we can stretch our Ceph cluster
into what applications. Parallism works excellent, but baseline throughput
it - perhaps - not what I would expect it to be.

Luminous cluster running bluestore - all OSD-daemons have 16GB of cache.

Fio files attacher - 4KB random read and 4KB random write - test file is
"only" 1GB
In this i ONLY care about raw IOPS numbers.

I have 2 pools, both 3x replicated .. one backed with SSDs S4510's
(14x1TB) and one with HDD's 84x10TB.

Network latency from rbd mount to one of the osd-hosts.
--- ceph-osd01.nzcorp.net ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9189ms
rtt min/avg/max/mdev = 0.084/0.108/0.146/0.022 ms

SSD:
randr:
# grep iops read*json | grep -v 0.00  | perl -ane'print $F[-1] . "\n"' |
cut -d\, -f1 | ministat -n
x <stdin>
     N           Min           Max        Median           Avg        Stddev
x  38       1727.07       2033.66       1954.71     1949.4789     46.592401
randw:
# grep iops write*json | grep -v 0.00  | perl -ane'print $F[-1] . "\n"' |
cut -d\, -f1 | ministat -n
x <stdin>
     N           Min           Max        Median           Avg        Stddev
x  36        400.05        455.26        436.58     433.91417     12.468187

The double (or triple) network penalty of-course kicks in and delivers a
lower throughput here.
Are these performance numbers in the ballpark of what we'd expect?

With 1GB of test file .. I would really expect this to be memory cached in
the OSD/bluestore cache
and thus deliver a read IOPS closer to theoretical max: 1s/0.108ms => 9.2K
IOPS

Again on the write side - all OSDs are backed by Battery-Backed write
cache, thus writes should go directly
into memory of the constroller .. .. still slower than reads - due to
having to visit 3 hosts.. but not this low?

Suggestions for improvements? Are other people seeing similar results?

For the HDD tests I get similar - surprisingly slow numbers:
# grep iops write*json | grep -v 0.00  | perl -ane'print $F[-1] . "\n"' |
cut -d\, -f1 | ministat -n
x <stdin>
     N           Min           Max        Median           Avg        Stddev
x  38         36.91         118.8         69.14     72.926842      21.75198

This should have the same performance characteristics as the SSD's as the
writes should be hitting BBWC.

# grep iops read*json | grep -v 0.00  | perl -ane'print $F[-1] . "\n"' |
cut -d\, -f1 | ministat -n
x <stdin>
     N           Min           Max        Median           Avg        Stddev
x  39         26.18        181.51         48.16     50.574872      24.01572

Same here - shold be cached in the blue-store cache as it is 16GB x 84
OSD's  .. with a 1GB testfile.

Any thoughts - suggestions - insights ?

Jesper

--
Piotr Dałek
piotr.dalek@xxxxxxxxxxxx
https://www.ovhcloud.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com