Re: crimson-osd vs legacy-osd: should the perf difference be already noticeable?

Avi Kivity <avi@xxxxxxxxxxxx> · Sun, 12 Jan 2020 17:46:25 +0200

On 10/01/2020 21.54, Radoslaw Zarzynski wrote:
Hi Roman,

On Thu, Jan 9, 2020 at 2:51 PM Roman Penyaev <rpenyaev@xxxxxxx> wrote:
First thing that catches my eye is that for small blocks there is no big
difference at all, but as the block increases, crimsons iops starts to
decline. Can it be the transport issue? Can be tested as well.
This is a known issue with the Seastar's POSIX-based network
stack. As Kefu pointed out, even large payloads are retrieved from
kernel with multiple small, fixed-size chunks. That's a matter of how
the internal interfaces were shaped. My personal impression is their
design favors the native stack / DPDK while avoiding differentiated
behavior among the stacks (likely to not surprise developers).

The goal is not to require different optimization techniques or code 
paths for the two stacks. And it is a challenge to do this well.

I guess we need a way to provide the state machine to 
seastar::input_stream so that it can provide the expected data layout to 
data_source. Maybe we need to bypass input_stream completely - it tries 
to buffer data by itself but here the application knows better. So 
perhaps we need to interact with data_source instead. The application 
would provide the buffers for the data_source, and seastar would take 
care of the copying if the data_source cannot use the buffer directly 
(the native stack case).

Moreover, crimson-osd imposes on Seastar additional memcpy to
reconcile those tiny chunks into a flat buffer.

Here is a more detailed gist:
   https://gist.github.com/rzarzynski/a1d67dc39b0ef4d49cb522179b1f3c89.

There are branches (for both Seastar and crimson) with PoC for
the "input buffer factory" that targets those issues. Performance
comparison is here:
   https://gist.github.com/rzarzynski/ad0aaa80b26603bc1a803ce0d209ac87.

Also, when narrowing the comparison to async-msgr vs crimson-msgr
(with ibf) I wouldn't expect too much of a difference. In read tests we're
observing pretty similar IPC for both crimson-osd and msgr-worker-n
(single thread profiling). The thing that might change a lot is the native
stack. In Intel's testing it significantly (up to 30-40% IIRC) improved
IPC of crimson-msgr. Glued with good SPDK support in Seastore
it might draw the POSIX stack (and thus the need for ibf) a bit obsolete.

Quick note on the saturation: please be careful when judging it with
top, pidstat or even perf stat. In contrast to the legacy OSD, Seastar
does busy-wait for awhile. This greatly exaggerates the CPU utilisation
for modest workloads. And yes, **crimson is all about the computational
efficiency**. We're much more interested in cycles/op than in raw IOPS,
to be honest.

Regards,
Radek
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx