Hi Avi, I responded inline to both of your messages. > On 13/01/2020 21.56, Radoslaw Zarzynski wrote: > Well, that's what I would like to avoid. I'd like a developer to know > that if they are developing with the posix stack, the application would > work and work well with the native stack, not that they have to retest > everything. I see your point. It's valid one. > That memcpy would be incurred only if the state machine specified it > needed its own buffer. If it specified it can run from a stack-provided > buffer, it would still be zero copy. If we could make this decision at run-time, then fine. I'm afraid application has currently no way to determine which stack is to be used and adjust the choice dynamically. If so, supporting both POSIX and native efficiently (with no trade-offs) would boil down into a compile-time decision, and thus separated builds. Still, twice the testing. :-( > What your proposal does is reuse the read(2) call's copy to userspace > for the application's purposes - the copy is still there. So the native > stack isn't handicapped by this copy, it happens in both. This assumes that crimson will always need to memcpy() the data retrieved from a network stack. I believe that's not the case. For the kernel drivers (POSIX stack + kernel's storage) the read() in Seastar is actually reused to provide kernel with an opportunity to remap pages, instead of doing memcpy(), when the retrieved payload is written to storage. This could happen as the buffer Seastar read into had been properly aligned. Apart from the read() itself there is no inherent memcpy() on the data path. When flowing through it, the payload is always conveyed as ref-counted scatter-gather list. This stays even if this SGL has only single segment like when using POSIX with the particular implementation of input_buffer_factory. For the user-space drivers (native stack + SPDK) there is a chance to squeeze memcpy() / remappings entirely. Of course, this assumes that storage HW is able to deal with inflated SGLs efficiently. I hope vendors could throw more light on that. >From our last discussion I recall your point about the impact on cache density (in the meaning of e.g. BlueStore's cache) ref-counting can impose. For sure segments of our SGL will be bigger than the actual payload and contain metadata or other junks. My answer would be that application's caching policy shouldn't belong to network layer. It has too little information to judge whether a given buffer needs to be cached or not. There are many cases when you won't cache. To exemplify while staying in the BlueStore's domain: `bluestore_default_buffered_write` is `false` by default. That is, BlueStore **doesn't cache on writes**. On Tue, Jan 14, 2020 at 12:16 PM Avi Kivity <avi@xxxxxxxxxxxx> wrote: > The default placing_data_source would just copy data from the original > data_source to the buffers provided by the user. > > The placing_data_source provided by > posix_data_source_impl::to_placing_data_source() would cooperate with > the posix stack to read directly into the buffers provided by the user > (reusing the copy performed by the system call). > > If a data movement engine is available, the native stack might program > it to perform the copy. Well, this is about off-loading the memcpy() we would introduce to the native stack. I still think the best – from the performance's point-of-view – is to avoid this shuffling at all. The sacrifice for that would the maintainability concern coming from differentiated paths in the network layer. Regards, Radek _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx