Re: crimson-osd vs legacy-osd: should the perf difference be already noticeable?

Radoslaw Zarzynski <rzarzyns@xxxxxxxxxx> · Tue, 14 Jan 2020 21:02:46 +0100

Hi Avi,

I responded inline to both of your messages.

> On 13/01/2020 21.56, Radoslaw Zarzynski wrote:
> Well, that's what I would like to avoid. I'd like a developer to know
> that if they are developing with the posix stack, the application would
> work and work well with the native stack, not that they have to retest
> everything.

I see your point. It's valid one.

> That memcpy would be incurred only if the state machine specified it
> needed its own buffer. If it specified it can run from a stack-provided
> buffer, it would still be zero copy.

If we could make this decision at run-time, then fine. I'm afraid
application has currently no way to determine which stack is to
be used and adjust the choice dynamically. If so, supporting both
POSIX and native efficiently (with no trade-offs) would boil down
into a compile-time decision, and thus separated builds.
Still, twice the testing. :-(

> What your proposal does is reuse the read(2) call's copy to userspace
> for the application's purposes - the copy is still there. So the native
> stack isn't handicapped by this copy, it happens in both.

This assumes that crimson will always need to memcpy() the data
retrieved from a network stack. I believe that's not the case.

For the kernel drivers (POSIX stack + kernel's storage)
the read() in Seastar is actually reused to provide kernel with
an opportunity to remap pages, instead of doing memcpy(),
when the retrieved payload is written to storage. This could
happen as the buffer Seastar read into had been properly
aligned.

Apart from the read() itself there is no inherent memcpy() on
the data path. When flowing through it, the payload is always
conveyed as ref-counted scatter-gather list. This stays even
if this SGL has only single segment like when using POSIX
with the particular implementation of input_buffer_factory.

For the user-space drivers (native stack + SPDK) there is a chance
to squeeze memcpy() / remappings entirely. Of course, this
assumes that storage HW is able to deal with inflated SGLs
efficiently. I hope vendors could throw more light on that.

>From our last discussion I recall your point about the impact
on cache density (in the meaning of e.g. BlueStore's cache)
ref-counting can impose. For sure segments of our SGL will be
bigger than the actual payload and contain metadata or other
junks.
My answer would be that application's caching policy shouldn't
belong to network layer. It has too little information to judge
whether a given buffer needs to be cached or not. There are
many cases when you won't cache. To exemplify while staying
in the BlueStore's domain:  `bluestore_default_buffered_write`
is `false` by default. That is, BlueStore **doesn't cache on writes**.

On Tue, Jan 14, 2020 at 12:16 PM Avi Kivity <avi@xxxxxxxxxxxx> wrote:
> The default placing_data_source would just copy data from the original
> data_source to the buffers provided by the user.
>
> The placing_data_source provided by
> posix_data_source_impl::to_placing_data_source() would cooperate with
> the posix stack to read directly into the buffers provided by the user
> (reusing the copy performed by the system call).
>
> If a data movement engine is available, the native stack might program
> it to perform the copy.

Well, this is about off-loading the memcpy() we would introduce
to the native stack. I still think the best – from the performance's
point-of-view – is to avoid this shuffling at all. The sacrifice for
that would the maintainability concern coming from differentiated
paths in the network layer.

Regards,
Radek
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx