Re: fio test rbd - single thread - qd1

Maged Mokhtar <mmokhtar@xxxxxxxxxxx> · Wed, 20 Mar 2019 09:25:38 +0200

On 19/03/2019 16:17, jesper@xxxxxxxx wrote:
Hi All.

I'm trying to get head and tails into where we can stretch our Ceph cluster
into what applications. Parallism works excellent, but baseline throughput
it - perhaps - not what I would expect it to be.

Luminous cluster running bluestore - all OSD-daemons have 16GB of cache.

Fio files attacher - 4KB random read and 4KB random write - test file is
"only" 1GB
In this i ONLY care about raw IOPS numbers.

I have 2 pools, both 3x replicated .. one backed with SSDs S4510's
(14x1TB) and one with HDD's 84x10TB.

Network latency from rbd mount to one of the osd-hosts.
--- ceph-osd01.nzcorp.net ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9189ms
rtt min/avg/max/mdev = 0.084/0.108/0.146/0.022 ms

SSD:
randr:
# grep iops read*json | grep -v 0.00  | perl -ane'print $F[-1] . "\n"' |
cut -d\, -f1 | ministat -n
x <stdin>
     N           Min           Max        Median           Avg        Stddev
x  38       1727.07       2033.66       1954.71     1949.4789     46.592401
randw:
# grep iops write*json | grep -v 0.00  | perl -ane'print $F[-1] . "\n"' |
cut -d\, -f1 | ministat -n
x <stdin>
     N           Min           Max        Median           Avg        Stddev
x  36        400.05        455.26        436.58     433.91417     12.468187

The double (or triple) network penalty of-course kicks in and delivers a
lower throughput here.
Are these performance numbers in the ballpark of what we'd expect?

With 1GB of test file .. I would really expect this to be memory cached in
the OSD/bluestore cache
and thus deliver a read IOPS closer to theoretical max: 1s/0.108ms => 9.2K
IOPS

Again on the write side - all OSDs are backed by Battery-Backed write
cache, thus writes should go directly
into memory of the constroller .. .. still slower than reads - due to
having to visit 3 hosts.. but not this low?

Suggestions for improvements? Are other people seeing similar results?

For the HDD tests I get similar - surprisingly slow numbers:
# grep iops write*json | grep -v 0.00  | perl -ane'print $F[-1] . "\n"' |
cut -d\, -f1 | ministat -n
x <stdin>
     N           Min           Max        Median           Avg        Stddev
x  38         36.91         118.8         69.14     72.926842      21.75198

This should have the same performance characteristics as the SSD's as the
writes should be hitting BBWC.

# grep iops read*json | grep -v 0.00  | perl -ane'print $F[-1] . "\n"' |
cut -d\, -f1 | ministat -n
x <stdin>
     N           Min           Max        Median           Avg        Stddev
x  39         26.18        181.51         48.16     50.574872      24.01572

Same here - shold be cached in the blue-store cache as it is 16GB x 84
OSD's  .. with a 1GB testfile.

Any thoughts - suggestions - insights ?

Jesper

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Cannot comment on the cache issue, hope someone will.

the ssd read latency of 0.5 ms, write latency of 2 ms are in the 
ballpark. with Bluestore it is difficult to get below 1 ms for write.

As suggested make sure your cpu has at most 1 c-state and p-state min 
freq is 100%. Also a cpu with higher GHz wouldg give better 1 qd / 
latency value than a cpu with high cores but less GHz

Maged
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com