Hi Kefu and everyone,
First, thank you Chunei! Your results helped me catch a bug in my
cycles/op calculation script. I was using iops instead of total ops, so
while the curve of the graphs was right the units were off. I've now
fixed this in the spreadsheet here:
https://docs.google.com/spreadsheets/d/1IR9ysWRkaGdX5e9w_YV8kfeeglR2pEY3V9hovY0ECV8/edit?usp=sharing
Ok, so getting that out of the way, I'm focusing on the fio numbers
since that's also what I tested. There are some fairly significant
differences in the way that we ran our respective tests however.
Looking at Chunmei's first set of numbers, they are for
memstore/cyanstore on a single 2GB RBD volume with a single fio instance
and numjobs=2. She limited the classic OSD to only have a single async
op thread, a single op shard, and a single op thread. Her tests were
also run for a shorter duration than mine. I've tried to summarize some
of these things below:
Chunmei settings:
Memory Allocator: seastar
RBD volumes: 1
RBD volume size: 2GB
fio numjobs: 2
fio iodepth: 64 (128 total)
fio runtime: 30s
ms_async_op_threads: 1
osd_op_num_threads_per_shard: 1
osd_op_num_shards: 1
pre-fill RBD volumes: ?
server co-located with client: no
Chunmei 4KB randreads
crimson+cyanstore: 50.5K IOPS, 66.5K cycles/op
classic+memstore: 41.1K IOPS, 146.6K cycles/op
Chunmei 4KB randwrites
crimson+cyanstore: 10.1K IOPS, 303.6K cycles/op
classic+memstore: 10.6K IOPS, 426.6K cycles/op
Almost all of my tests have been focused on alienstore. The only
cyanstore tests I have were fairly early on using the default memory
allocator. I was seeing extremely rapid memory growth using seastar's
memory allocator when using larger volume sizes (200GB+!). I did test
various io depths, and do have iodepth=64 result, but it's across 4 fio
clients rather than 1, and with numjobs=1 instead of 2. Probably the
closest equivalent is iodepth=32 so we both have a total of 128
outstanding IOs from clients.
Mark settings:
Memory Allocator: default
RBD volumes: 4
RBD volume size: 16GB
fio numjobs: 1
fio iodepth: 32 (128 total)
fio runtime 300s
ms_async_op_threads: 3 (default)
osd_op_num_threads_per_shard: 2 (SSD default)
osd_op_num_shards: 8 (SSD default)
pre-fill RBD volumes: yes, 4MB writes
server co-located with client: yes
Mark 4KB randreads:
crimson+cyanstore: 89.2K IOPS, 41.3K cycles/op
classic+memstore: 166.1K IOPS, 142.2K cycles/op
Mark 4KB randwrites
crimson+cyanstore: 6.7K IOPS, 550.5K cylces/op
classic+memstore: 23.7K IOPS, 1356.9K cycles/op
To be honest these tests were run so differently that I'm not sure we
can really directly compare the results. Seastar's memory allocator
alone has a fairly large effect on efficiency/performance, not to
mention the other differences in settings here. Here are my rough
takeaways when comparing the results:
1) crimson+cyanstore is generally more efficient than classic+memstore
2) both are far less efficient for writes than reads (but we knew that)
3) classic+memstore appears to benefit from having additional op
threads/shards
4) crimson+cyanstore is likely faster with seastar's memory allocator
(but it's causing memory growth problems for Mark)
5) both seem to benefit from increased io depth, but in different ways
(see mark's smaller io_depth tests)
Also, based on previous work I've done, I'm not sure classic memstore is
really a good target to test against, at least without some
modifications. I'll see if I can get some of my old changes merged
upstream that I believe might make it faster and more efficient.
Mark
On 2/10/21 1:00 AM, Kefu Chai wrote:
hi Mark and Radek,
i am sending this mail for further discussion on our recent perf tests
on crimson.
Chunmei also performed performance tests testing classic osd +
memstore and crimson osd + cyanstore using "rados bench" and fio. where
- only a single async_op threads and a single core is designated to
classic-osd,
- two rados bench instances were used when testing with "rados bench".
- two jobs are used when testing with fio
- server is not co-located with client.
- single osd instance.
see
- https://gist.github.com/liu-chunmei/4fd88fd0ff56d6849439a2df329aa80e
- https://gist.github.com/liu-chunmei/f696b9c4f31b123fb223cdd47f13c8ea
respectively.
her findings are
- in the rados bench tests, crimson performs better than classic osd
in general.
- in the fio tests, the performance of crimson is almost on par with
that of classic osd. the cycles-per-op of crimson is significantly
lower than that of classic osd.
but in last standup, she mentioned that her impression was that the
alien store does not benefit from adding more threads than 2. this
does not match with your findings recently.
thoughts?
cheers,
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx