Re: crimson perf test

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Kefu and everyone,


First, thank you Chunei!  Your results helped me catch a bug in my cycles/op calculation script.  I was using iops instead of total ops, so while the curve of the graphs was right the units were off.  I've now fixed this in the spreadsheet here:


https://docs.google.com/spreadsheets/d/1IR9ysWRkaGdX5e9w_YV8kfeeglR2pEY3V9hovY0ECV8/edit?usp=sharing


Ok, so getting that out of the way, I'm focusing on the fio numbers since that's also what I tested.  There are some fairly significant differences in the way that we ran our respective tests however.  Looking at Chunmei's first set of numbers, they are for memstore/cyanstore on a single 2GB RBD volume with a single fio instance and numjobs=2.  She limited the classic OSD to only have a single async op thread, a single op shard, and a single op thread.  Her tests were also run for a shorter duration than mine.  I've tried to summarize some of these things below:


Chunmei settings:

Memory Allocator: seastar

RBD volumes: 1

RBD volume size: 2GB

fio numjobs: 2

fio iodepth: 64 (128 total)

fio runtime: 30s

ms_async_op_threads: 1

osd_op_num_threads_per_shard: 1

osd_op_num_shards: 1

pre-fill RBD volumes: ?

server co-located with client: no


Chunmei 4KB randreads

crimson+cyanstore: 50.5K IOPS, 66.5K cycles/op

classic+memstore: 41.1K IOPS, 146.6K cycles/op


Chunmei 4KB randwrites

crimson+cyanstore: 10.1K IOPS, 303.6K cycles/op

classic+memstore: 10.6K IOPS, 426.6K cycles/op


Almost all of my tests have been focused on alienstore.  The only cyanstore tests I have were fairly early on using the default memory allocator. I was seeing extremely rapid memory growth using seastar's memory allocator when using larger volume sizes (200GB+!).  I did test various io depths, and do have iodepth=64 result, but it's across 4 fio clients rather than 1, and with numjobs=1 instead of 2.  Probably the closest equivalent is iodepth=32 so we both have a total of 128 outstanding IOs from clients.


Mark settings:

Memory Allocator: default

RBD volumes: 4

RBD volume size: 16GB

fio numjobs: 1

fio iodepth: 32 (128 total)

fio runtime 300s

ms_async_op_threads: 3 (default)

osd_op_num_threads_per_shard: 2 (SSD default)

osd_op_num_shards: 8 (SSD default)

pre-fill RBD volumes: yes, 4MB writes

server co-located with client: yes


Mark 4KB randreads:

crimson+cyanstore: 89.2K IOPS, 41.3K cycles/op

classic+memstore: 166.1K IOPS, 142.2K cycles/op


Mark 4KB randwrites

crimson+cyanstore: 6.7K IOPS, 550.5K cylces/op

classic+memstore: 23.7K IOPS, 1356.9K cycles/op



To be honest these tests were run so differently that I'm not sure we can really directly compare the results.  Seastar's memory allocator alone has a fairly large effect on efficiency/performance, not to mention the other differences in settings here.  Here are my rough takeaways when comparing the results:

1) crimson+cyanstore is generally more efficient than classic+memstore
2) both are far less efficient for writes than reads (but we knew that)
3) classic+memstore appears to benefit from having additional op threads/shards 4) crimson+cyanstore is likely faster with seastar's memory allocator (but it's causing memory growth problems for Mark) 5) both seem to benefit from increased io depth, but in different ways (see mark's smaller io_depth tests)

Also, based on previous work I've done, I'm not sure classic memstore is really a good target to test against, at least without some modifications.  I'll see if I can get some of my old changes merged upstream that I believe might make it faster and more efficient.

Mark


On 2/10/21 1:00 AM, Kefu Chai wrote:
hi Mark and Radek,

i am sending this mail for further discussion on our recent perf tests on crimson.

Chunmei also performed performance tests testing classic osd + memstore and crimson osd + cyanstore using "rados bench" and fio. where

- only a single async_op threads and a single core is designated to classic-osd,
- two rados bench instances were used when testing with "rados bench".
- two jobs are used when testing with fio
- server is not co-located with client.
- single osd instance.

see

- https://gist.github.com/liu-chunmei/4fd88fd0ff56d6849439a2df329aa80e
- https://gist.github.com/liu-chunmei/f696b9c4f31b123fb223cdd47f13c8ea

respectively.

her findings are

- in the rados bench tests, crimson performs better than classic osd in general. - in the fio tests, the performance of crimson is almost on par with that of classic osd. the cycles-per-op of crimson is significantly lower than that of classic osd.

but in last standup, she mentioned that her impression was that the alien store does not benefit from adding more threads than 2. this does not match with your findings recently.

thoughts?

cheers,
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx




[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux