On 15/03/18 18:10, Miklos Szeredi wrote: <> >> This can never properly translate. Even a simple file on disk >> is linear for the app (unaligned buffer) but is scattered on >> multiple blocks on disk. Yes perhaps networking can somewhat work >> if you pre/post-pend the headers you need. >> And you restrict direct IO semantics on everything specially the APP >> with my system you can do zero copy on any kind of application > > I lost you there, sorry. > > How will your scheme deal with alignment issues better than my scheme? > In my pmem case easy memcpy. This will not work if you need to go to hard disk I agree. (Which is not a priority for me) >> And this assumes networking or some-device. Which means going back >> to the Kernel, which in ZUFS rules you must return -ASYNC to the zuf >> and complete in a background ASYNC thread. This is an order of a magnitude >> higher latency then what I showed here. > > Indeed. > >> And what about the SYNC copy from Server to APP. With a pipe you >> are forcing me to go back to the Kernel to execute the copy. which >> means two more crossings. This will double the round trips. > > If you are trying to minimize the roundtrips, why not cache the > mapping in the kernel? That way you don't necessarily have to go to > userspace at all. With readahead logic, the server will be able to > preload the mapping before the reads happen, and you basically get the > same speed as an in-kernel fs would. > Yes as I said that was my first approach. But at the end this is always a special workload optimization but in the general case this actually adds a round trip and a huge complexity that always comes to bite you. > Also don't quite understand how are you planning to generalize beyond > the pmem case. The interface is ready for that, sure. But what about > caching? Will that be done in the server? Does that make sense? > Kernel already has page cache for that purpose and userspace cache > won't ever be as good as kernel cache. > I explained about that. We can easily support page-cache in zufs here what I wrote: > Please note that it will be very easy with this API to also support > page-cache for FSs that wants it like the network FSs you said. > The FS will set a bit in the fs_register call to say that it would > rather use page cache. These type of FSs will run on a different > kind of BDI which says "Yes page cache please". All the IO entry > vectors point to the generic_iter API and instead we implement > read/write_pages(). At read/write_pages() we do the exact same OP_READ/WRITE > like today. map the cache pages to the zus VM, despatch, return, release page_lock. > all is happy. Any one wanting to contribute this is very welcome. Yes please no caching at the zus level that's insane > Thanks, > Miklos > Thanks Boaz