Re: crimson-osd vs legacy-osd: should the perf difference be already noticeable?

Avi Kivity <avi@xxxxxxxxxxxx> · Wed, 15 Jan 2020 15:34:24 +0200

On 15/01/2020 15.21, Radoslaw Zarzynski wrote:
Hi Avi,

On Wed, Jan 15, 2020 at 12:22 PM Avi Kivity <avi@xxxxxxxxxxxx> wrote:
I don't understand why you say this. If Seastar performs the memcpy
transparently when needed, why do you need separate builds?
Because I don't want to just handle both stacks. I want to handle them
as efficiently as possible. Preferably, I want to have single build only.
The most efficient strategy for buffer allocation varies depending
on which stacks has been selected:

   * for POSIX the best one is to reuse the read() memcpy by providing
     application-allocated buffer with proper alignment. The `memcpy()`
     already happens even for the sake of preserving the kernel-user
     isolation and we can do little about that.
     Let's call this strategy "application-provided".
   * for native all you want is to grab and ref-count++ the memory buffer
    where NIC had thrown the payload. There is no memcpy / page
    remapping anywhere. There is no memcpy that could be reused for
    anything. Even transparent memcpy is wasteful.

In my understanding which stack is to be used isn't exposed to
application by Seastar yet. If so, this would translate into choosing
the strategy **blindly**. If crimson had decided to go with:

   * the native-friendly "stack-provided" while the actual stack is
     the POSIX one, then application would have lost the way to
     reuse the memcpy;
   * the POSIX-friendly "application-provided" but the actual stack
     is native – the transparent memcpy() would have happened.

This is what Seastar provides today and I agree it can should be improved.

This is the memcpy I was referring to. And the solution I'd like to see
is one where the application tells seastar what buffers it wants to see
the data in.
I think the misunderstanding comes from our imaginations on what
application tells Seastar. I perceive that your point assumes the choice
is constant:

   * ALWAYS use "application-provided" OR
   * ALWAYS use "stack-provided".

Unfortunately, underlying OS / hardware characteristics make
this simple approach inefficient. To avoid unnecessary overhead
in both cases application should be able to say:

   * WHEN "posix" use "application-provided" AND
   * WHEN "native" use "stack-provided".

This is what I don't understand. The application has no requirement to 
do something depending on the stack. It has requirements (or 
non-requirements) on alignment.

What I propose it:

 * WHEN you have a requirement for aligned buffers, use 
"application-provided"

 * WHEN you do not have a requirement for aligned buffers, use 
"stack-provided"

After the applications starts, do you not know whether you have a 
requirement for alignment or not?

If you know you don't need the memcpy, don't provide your pre-allocated
buffers and it won't happen.
This is exactly what I'm targeting. If I could make this decision
(whether to provide an application-allocated buffer or accept
a stack-provided one) conditionally (isNative() / isPosix()), then
it's done! :-)

Why is it dependent on the stack? It should depend only on whether you 
need alignment (or more generically, placement) or not.

Let's look at a similar problem and similar solution. Disks either have 
a volatile or non-volatile write cache. Applications don't ask whether 
the disk has a volatile write cache or not. Instead, they indicate their 
requirements. If they want to hit the media they open the file with 
O_DSYNC (other methods are available, let's look at this example). The 
storage stack then takes care of the rest. If the write cache is not 
volatile, then it is enough to sent the write to the disk. If the write 
cache is volatile, then the storage stack can set the FUA bit, or if the 
write is split, it can issue several volatile writes and then instruct 
the disk to flush the write cache, and only then return to the user.

The application indicates to the stack what it requires, rather than 
asking the stack how it is implemented. I would like something similar 
for data placement.

Regards,
Radek

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx