Hi Vitali,
Sorry for a long response, I was on vacation.
On 2020-07-20 00:44, vitalif@xxxxxxxxxx wrote:
Hi Roman,
It's always really interesting to read your messages :) maybe you'll
join our telegram chat @ceph_ru? One of your colleagues is there :)
Promise a lot of fun? ;)
Client-based replication is of course the fastest,
With this comparison I answer the question: if it is fastest, then how
much. Because obvious "of course is the fastest", well, not so reasoned
:)
but the problem is that it's unclear how to provide consistency with
it.
Here I tried to highlight client-based replication problems:
https://lists.ceph.io/hyperkitty/list/dev@xxxxxxx/thread/N46NR7NBHWBQL4B2ASU7Y2LMKZZPK3IX/
And yes, comes with restrictions, but there are scenarios which do not
require strong sequential consistency of log-based replication, e.g.
if you run an 1 rbd client per 1 image and a filesystem on top with
journaling and strong requests order why not to rely on the filesystem
recovery mechanisms?
That is the question which bothers me for quite a while and that is
exactly
the reason why I started pech osd: to find some answers.
Maybe it's
possible with some restrictions, but... I don't think it's possible in
Ceph :)
No, for sure not. Not RADOS strong consistency semantics. But why? Take
what you need and cut what is useless to fit your requirements.
By the way, have you tested their Crimson OSD?
Yes, I did. But with Crimson OSD everything went wrong: the first
problem I
came across is that I was not able to reach desired number of 120 OSDs
running: the average number after various restarts I got from monitor
is ~50. I did not try to debug and simply reduced number of OSDs to
35 (5 hosts, 7 ODSs on each) and reran all tests, so here is the results
for all types of osds (for fair reference):
35OSDS/crimson/primary-copy
write/iops write/bw write/clat_ns/mean
4k 3.50 K 14.30 MB/s 888.00 ms
8k 4.97 K 40.37 MB/s 687.27 ms
16k 4.90 K 79.63 MB/s 709.63 ms
32k 4.48 K 145.50 MB/s 703.80 ms
64k 4.46 K 290.72 MB/s 731.06 ms
128k 4.38 K 570.44 MB/s 720.32 ms
256k 4.13 K 1.05 GB/s 755.77 ms
512k 2.56 K 1.32 GB/s 1.15 s
1m 1.16 K 1.24 GB/s 2.22 s
35OSDS/ceph/primary-copy
write/iops write/bw write/clat_ns/mean
4k 90.74 K 355.92 MB/s 44.51 ms
8k 75.03 K 588.98 MB/s 53.45 ms
16k 92.58 K 1.42 GB/s 42.10 ms
32k 122.95 K 3.76 GB/s 33.27 ms
64k 101.28 K 6.20 GB/s 39.45 ms
128k 45.28 K 5.57 GB/s 83.26 ms
256k 26.08 K 6.44 GB/s 138.16 ms
512k 14.49 K 7.23 GB/s 254.74 ms
1m 5.91 K 6.05 GB/s 588.28 ms
35OSDS/pech/primary-copy
write/iops write/bw write/clat_ns/mean
4k 289.22 K 1.10 GB/s 14.94 ms
8k 231.93 K 1.77 GB/s 15.94 ms
16k 228.60 K 3.49 GB/s 17.28 ms
32k 208.95 K 6.39 GB/s 19.08 ms
64k 106.66 K 6.53 GB/s 37.69 ms
128k 53.48 K 6.57 GB/s 73.03 ms
256k 25.03 K 6.19 GB/s 139.59 ms
512k 12.63 K 6.32 GB/s 302.50 ms
1m 5.91 K 6.05 GB/s 650.03 ms
I did not notice anything strange in crimsons logs and did try to
debug, so do not know why the results are so bad for the crimson
case.
Is it any faster than
current implementation? (regarding iodepth=1 fsync=1 latency)
My original goal was to test real distributed load: many osd hosts,
many clients hosts (I was keen to see how Pech was behaving).
Your "latency" load does not require a cluster setup and can be
executed on a localhost with 3 osds (x3 replication), so here
are the results:
"-o ms_crc_data=false -o debug_osd=0 -o debug_ms=0"
rbd.fio
rw=randwrite
iodepth=1
numjobs=1
runtume=10
size=256m
/// crimson-osd
4k IOPS=101, BW=406KiB/s, Lat=9846.09usec
8k IOPS=100, BW=802KiB/s, Lat=9973.39usec
16k IOPS=99, BW=1599KiB/s, Lat=10000.17usec
32k IOPS=96, BW=3088KiB/s, Lat=10355.56usec
64k IOPS=591, BW=36.0MiB/s, Lat=1687.63usec
128k IOPS=508, BW=63.6MiB/s, Lat=1963.95usec
256k IOPS=379, BW=94.9MiB/s, Lat=2632.29usec
512k IOPS=338, BW=169MiB/s, Lat=2952.39usec
1m IOPS=166, BW=166MiB/s, Lat=6011.29usec
/// ceph-osd
4k IOPS=1908, BW=7634KiB/s, Lat=522.07usec
8k IOPS=1838, BW=14.4MiB/s, Lat=542.10usec
16k IOPS=1751, BW=27.4MiB/s, Lat=568.98usec
32k IOPS=2048, BW=64.0MiB/s, Lat=486.48usec
64k IOPS=1985, BW=124MiB/s, Lat=501.80usec
128k IOPS=1869, BW=234MiB/s, Lat=532.96usec
256k IOPS=1645, BW=411MiB/s, Lat=605.66usec
512k IOPS=1195, BW=598MiB/s, Lat=833.64usec
1m IOPS=704, BW=705MiB/s, Lat=1414.01usec
/// pech-osd
OSD=X; CEPH=~/devel/ceph-upstream; ./pech-osd --mon_addrs
192.168.0.97:50001 --server_ip 0.0.0.0 --name $OSD --fsid `cat
$CEPH/build/dev/osd$OSD/fsid` --class_dir $CEPH/build/lib --log_level 5
--replication primary-copy --nocrc
4k IOPS=5618, BW=21.9MiB/s, Lat=176.48usec
8k IOPS=5654, BW=44.2MiB/s, Lat=175.26usec
16k IOPS=5504, BW=86.0MiB/s, Lat=180.20usec
32k IOPS=4976, BW=156MiB/s, Lat=199.37usec
64k IOPS=4334, BW=271MiB/s, Lat=229.09usec
128k IOPS=3397, BW=425MiB/s, Lat=292.52usec
256k IOPS=2392, BW=598MiB/s, Lat=416.12usec
512k IOPS=1505, BW=753MiB/s, Lat=661.25usec
1m IOPS=687, BW=688MiB/s, Lat=1446.60usec
Results should be treated carefully, since practically these numbers
are almost certainly unreachable, but here you are right: numbers give
a clear upper bound.
--
Roman
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx