On Sat, Sep 24, 2016 at 04:59:08AM +0100, Al Viro wrote: > FWIW, updated (with fixes) and force-pushed. Added piece: > default_file_splice_read() converted to iov_iter. Seems to work, after > fixing a braino in __pipe_get_pages(). Changed: #4 (sleep only in the > beginning, as described above), #6 (context changes from #4), #10 (missing > get_page() added in __pipe_get_pages()), #11 (removed pointless truncation > of len - ->read_iter() can bloody well handle that on its own) and added #12. > Stands at 28 files changed, 657 insertions(+), 1009 deletions(-) now... I think I see how to get full zero-copy (including the write side of things). Just add a "from" side for ITER_PIPE iov_iter (advance, get_pages, get_pages_alloc, npages and alignment will need to behave differently for "to" and "from" ones) and pull the following trick: have fault_in_readable return NULL instead of 0, ERR_PTR(-EFAULT) instead of -EFAULT *and* return a struct page if it was asked for a full-page range on a page that could be successfully stolen (only "from pipe" iov_iter would go for the last one, of course). Then we make generic_perform_write() shove the return value of fault-in into 'page'. ->write_begin() is given &page as an argument, to return the resulting page via that. All instances currently just store into that pointer, completely ignoring the prior value. And they'll keep working just fine. Let's make sure that all method call sites outside of generic_perform_write() (there's only one such, actually) have NULL stored in there prior to the call. Now we can start switching the instances to zero-copy support - all it takes is replacing grab_cache_page_write_begin() with "if *page is non-NULL, try to shove it (locked, non-uptodate) into pagecache; if that succeeds grab a reference to our page and we are done, if it fails - fall back to grab_cache_page_write_begin()". Then do get_block, etc., or whatever that ->write_begin() instance would normally do, just remember not to zero anything if the page had been passed to us by caller. Now all we need is to make sure that iov_iter_copy_from_user_atomic() for those guys recongnizes the case of full-page copy when source and target are the same page and quietly returns PAGE_SIZE. Voila - we can make iter_file_splice_write() pass pipe-backed iov_iter instead of bvec-backed one *and* get write-side zero-copy for all filesystems with ->write_begin() taught to handle that (see above). Since the filesystems with unmodified ->write_begin() will act correctly (just do the copying), we don't have to make that a flagday change; ->write_begin() instances can be switched one by one. Similar treatment of iomap_write_begin()/iomap_write_actor() would cover iomap-using ->write_iter() instances. It's clearly not something I want to touch until -rc1, but it looks feasible for the next cycle, and if done right it promises to unify the plain and splice sides of fuse_dev_...() stuff, simplifying the hell out of them without losing zero-copy there. And if everything really goes right, we might be able to get rid of net/* ->splice_read() and ->sendpage() methods as well... -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html