On Thu, Mar 15, 2018 at 5:30 PM, Boaz Harrosh <boazh@xxxxxxxxxx> wrote: > On 15/03/18 18:10, Miklos Szeredi wrote: > <> >>> This can never properly translate. Even a simple file on disk >>> is linear for the app (unaligned buffer) but is scattered on >>> multiple blocks on disk. Yes perhaps networking can somewhat work >>> if you pre/post-pend the headers you need. >>> And you restrict direct IO semantics on everything specially the APP >>> with my system you can do zero copy on any kind of application >> >> I lost you there, sorry. >> >> How will your scheme deal with alignment issues better than my scheme? >> > > In my pmem case easy memcpy. This will not work if you need to go > to hard disk I agree. (Which is not a priority for me) > >>> And this assumes networking or some-device. Which means going back >>> to the Kernel, which in ZUFS rules you must return -ASYNC to the zuf >>> and complete in a background ASYNC thread. This is an order of a magnitude >>> higher latency then what I showed here. >> >> Indeed. >> >>> And what about the SYNC copy from Server to APP. With a pipe you >>> are forcing me to go back to the Kernel to execute the copy. which >>> means two more crossings. This will double the round trips. >> >> If you are trying to minimize the roundtrips, why not cache the >> mapping in the kernel? That way you don't necessarily have to go to >> userspace at all. With readahead logic, the server will be able to >> preload the mapping before the reads happen, and you basically get the >> same speed as an in-kernel fs would. >> > > Yes as I said that was my first approach. But at the end this is > always a special workload optimization but in the general case this > actually adds a round trip and a huge complexity that always comes > to bite you. Ideally most of the complexity would be in the page cache. Not sure how ready it is to handle pmem pages? The general case (non-pmem) will always have to be handled differently; you've just stated that it's much less latency sensitive and needs async handling. Basing the design on just trying to make it use the same mechanism (userspace copy) is flawed in my opinion, since it's suboptimal for either case. Thanks, Miklos