Re: crimson perf test

Mark Nelson <mnelson@xxxxxxxxxx> · Wed, 10 Feb 2021 08:38:55 -0600

Hi Kefu and everyone,

First, thank you Chunei!  Your results helped me catch a bug in my 
cycles/op calculation script.  I was using iops instead of total ops, so 
while the curve of the graphs was right the units were off.  I've now 
fixed this in the spreadsheet here:

https://docs.google.com/spreadsheets/d/1IR9ysWRkaGdX5e9w_YV8kfeeglR2pEY3V9hovY0ECV8/edit?usp=sharing

Ok, so getting that out of the way, I'm focusing on the fio numbers 
since that's also what I tested.  There are some fairly significant 
differences in the way that we ran our respective tests however.  
Looking at Chunmei's first set of numbers, they are for 
memstore/cyanstore on a single 2GB RBD volume with a single fio instance 
and numjobs=2.  She limited the classic OSD to only have a single async 
op thread, a single op shard, and a single op thread.  Her tests were 
also run for a shorter duration than mine.  I've tried to summarize some 
of these things below:

Chunmei settings:

Memory Allocator: seastar

RBD volumes: 1

RBD volume size: 2GB

fio numjobs: 2

fio iodepth: 64 (128 total)

fio runtime: 30s

ms_async_op_threads: 1

osd_op_num_threads_per_shard: 1

osd_op_num_shards: 1

pre-fill RBD volumes: ?

server co-located with client: no

Chunmei 4KB randreads

crimson+cyanstore: 50.5K IOPS, 66.5K cycles/op

classic+memstore: 41.1K IOPS, 146.6K cycles/op

Chunmei 4KB randwrites

crimson+cyanstore: 10.1K IOPS, 303.6K cycles/op

classic+memstore: 10.6K IOPS, 426.6K cycles/op

Almost all of my tests have been focused on alienstore.  The only 
cyanstore tests I have were fairly early on using the default memory 
allocator. I was seeing extremely rapid memory growth using seastar's 
memory allocator when using larger volume sizes (200GB+!).  I did test 
various io depths, and do have iodepth=64 result, but it's across 4 fio 
clients rather than 1, and with numjobs=1 instead of 2.  Probably the 
closest equivalent is iodepth=32 so we both have a total of 128 
outstanding IOs from clients.

Mark settings:

Memory Allocator: default

RBD volumes: 4

RBD volume size: 16GB

fio numjobs: 1

fio iodepth: 32 (128 total)

fio runtime 300s

ms_async_op_threads: 3 (default)

osd_op_num_threads_per_shard: 2 (SSD default)

osd_op_num_shards: 8 (SSD default)

pre-fill RBD volumes: yes, 4MB writes

server co-located with client: yes

Mark 4KB randreads:

crimson+cyanstore: 89.2K IOPS, 41.3K cycles/op

classic+memstore: 166.1K IOPS, 142.2K cycles/op

Mark 4KB randwrites

crimson+cyanstore: 6.7K IOPS, 550.5K cylces/op

classic+memstore: 23.7K IOPS, 1356.9K cycles/op

To be honest these tests were run so differently that I'm not sure we 
can really directly compare the results.  Seastar's memory allocator 
alone has a fairly large effect on efficiency/performance, not to 
mention the other differences in settings here.  Here are my rough 
takeaways when comparing the results:

1) crimson+cyanstore is generally more efficient than classic+memstore
2) both are far less efficient for writes than reads (but we knew that)
3) classic+memstore appears to benefit from having additional op 
threads/shards
4) crimson+cyanstore is likely faster with seastar's memory allocator 
(but it's causing memory growth problems for Mark)
5) both seem to benefit from increased io depth, but in different ways 
(see mark's smaller io_depth tests)

Also, based on previous work I've done, I'm not sure classic memstore is 
really a good target to test against, at least without some 
modifications.  I'll see if I can get some of my old changes merged 
upstream that I believe might make it faster and more efficient.

Mark

On 2/10/21 1:00 AM, Kefu Chai wrote:
hi Mark and Radek,

i am sending this mail for further discussion on our recent perf tests 
on crimson.

Chunmei also performed performance tests testing classic osd + 
memstore and crimson osd + cyanstore using "rados bench" and fio. where

- only a single async_op threads and a single core is designated to 
classic-osd,
- two rados bench instances were used when testing with "rados bench".
- two jobs are used when testing with fio
- server is not co-located with client.
- single osd instance.

see

- https://gist.github.com/liu-chunmei/4fd88fd0ff56d6849439a2df329aa80e
- https://gist.github.com/liu-chunmei/f696b9c4f31b123fb223cdd47f13c8ea

respectively.

her findings are

- in the rados bench tests, crimson performs better than classic osd 
in general.
- in the fio tests, the performance of crimson is almost on par with 
that of classic osd. the cycles-per-op of crimson is significantly 
lower than that of classic osd.

but in last standup, she mentioned that her impression was that the 
alien store does not benefit from adding more threads than 2. this 
does not match with your findings recently.

thoughts?

cheers,
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx