On Sat, 24 Sep 2016 18:29:01 +0100 Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > On Sat, Sep 24, 2016 at 04:59:08AM +0100, Al Viro wrote: > > > FWIW, updated (with fixes) and force-pushed. Added piece: > > default_file_splice_read() converted to iov_iter. Seems to work, after > > fixing a braino in __pipe_get_pages(). Changed: #4 (sleep only in the > > beginning, as described above), #6 (context changes from #4), #10 (missing > > get_page() added in __pipe_get_pages()), #11 (removed pointless truncation > > of len - ->read_iter() can bloody well handle that on its own) and added #12. > > Stands at 28 files changed, 657 insertions(+), 1009 deletions(-) now... > > I think I see how to get full zero-copy (including the write side > of things). Just add a "from" side for ITER_PIPE iov_iter (advance, > get_pages, get_pages_alloc, npages and alignment will need to behave > differently for "to" and "from" ones) and pull the following trick: > have fault_in_readable return NULL instead of 0, ERR_PTR(-EFAULT) instead > of -EFAULT *and* return a struct page if it was asked for a full-page > range on a page that could be successfully stolen (only "from pipe" iov_iter > would go for the last one, of course). Then we make generic_perform_write() > shove the return value of fault-in into 'page'. ->write_begin() is given > &page as an argument, to return the resulting page via that. All instances > currently just store into that pointer, completely ignoring the prior value. > And they'll keep working just fine. > > Let's make sure that all method call sites outside of > generic_perform_write() (there's only one such, actually) have NULL > stored in there prior to the call. Now we can start switching the > instances to zero-copy support - all it takes is replacing > grab_cache_page_write_begin() with "if *page is non-NULL, try to > shove it (locked, non-uptodate) into pagecache; if that succeeds grab a > reference to our page and we are done, if it fails - fall back to > grab_cache_page_write_begin()". Then do get_block, etc., or whatever that > ->write_begin() instance would normally do, just remember not to zero anything > if the page had been passed to us by caller. Interesting stuff. It should also be possible for a filesystem to replace existing pagecache as a zero-copy overwrite with the migration APIs and just a little bit of work. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html