Re: crimson-osd vs legacy-osd: should the perf difference be already noticeable?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 15/01/2020 16.12, Radoslaw Zarzynski wrote:
On Wed, Jan 15, 2020 at 2:34 PM Avi Kivity <avi@xxxxxxxxxxxx> wrote:
This is what Seastar provides today and I agree it can should be improved.
I agree. Let's continue our discussion and try to find the way. :-)

What I propose it:


   * WHEN you have a requirement for aligned buffers, use
"application-provided"

   * WHEN you do not have a requirement for aligned buffers, use
"stack-provided"


After the applications starts, do you not know whether you have a
requirement for alignment or not?
We have the knowledge on alignment, so let's experiment with
the proposed ruleset to judge performance repercussions.

Today, when crimson-osd is all about the cyan store (simple,
RAM-backed store for testing), we can definitely say there is
no requirement for alignment. Basing on that and the rule:

    * WHEN you do not have a requirement for aligned buffers, use
    "stack-provided".

Therefore we should opt for "stack-provided". Let's verify
the result:

   * if the actual stack is native, everything is OK. There will
      be no even single memcpy, no syscall.
   * if the actual stack is POSIX, as there is no provided buffer,
     there is also no buffer.length. The stack needs to guess
     how many bytes read() from the socket. If the guessed
     number is too small, the application is hurt by excessive
     syscalls. This happens today. :-(


Ok, so it's not just about alignment, but also about sizes. We can also allow the application to specify how many bytes it wants to read (in fact, it can already do that with read_exactly, but input_stream does not pass the information along).


Let's list the possible cases:

- the application knows nothing (common when parsing a complex stream containing small objects). This is where Scylla is, similar to an HTTP server.

- the protocol has rigid structure (fixed size header + variable payload). The application wants the header in a linearized buffer and the payload in a free-form iovec. This corresponds to cyanstore.

- the protocol has rigid structure as above. The application wants the header in a linearized buffer and the payload in its own buffers due to alignment or ownership requirements. This corresponds to a production storage server that has alignment requirements for talking to storage and ownership/placement requirements for caching blocks.


In the first case, input_stream should provide buffers as it reads them. Buffers can end due to packet boundaries (native stack), input_stream buffer boundaries (posix stack) or due to exhausting all received data (both).

In the second case, the socket (perhaps not input_stream) should linearize the header, provide the payload as a sequence of buffers, and should attempt not to over-read (over-reading the payload can require linearization of the next header or trailer)

In the third case, the socket linearizes both the header and payload, the first into a buffer it allocates by itself, the second into a buffer provided by the user.


Is this a good set of capabilities to provide?


If it is, then we can implement "linearizes" differently for each stack, and also depending on whether the buffer is provided by the user or the stack.


For buffers provided by the stack (which there is only a linearization requirement, not a placement requirement):

 - posix allocates a buffer and issues read() syscalls until the buffer is full

 - native will attempt to temporary_buffer::share() the buffer if it fits into a packet, and allocate and copy if it does not

For buffers provided by the user (placement requirement)

 - posix issues repeated read() syscalls until the buffer is full

 - native will memcpy from raw packets into the buffers


Note: "buffer" here can also be an iovec or equivalent. In that case it will be read into using readv(), and "linearization" only happens within individual elements of the iovec.

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx




[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux