Re: [PATCH 04/11] splice: lift pipe_lock out of splice_to_pipe()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Sep 24, 2016 at 04:59:08AM +0100, Al Viro wrote:

> 	FWIW, updated (with fixes) and force-pushed.  Added piece:
> default_file_splice_read() converted to iov_iter.  Seems to work, after
> fixing a braino in __pipe_get_pages().  Changed: #4 (sleep only in the
> beginning, as described above), #6 (context changes from #4), #10 (missing
> get_page() added in __pipe_get_pages()), #11 (removed pointless truncation
> of len - ->read_iter() can bloody well handle that on its own) and added #12.
> Stands at 28 files changed, 657 insertions(+), 1009 deletions(-) now...

	I think I see how to get full zero-copy (including the write side
of things).  Just add a "from" side for ITER_PIPE iov_iter (advance,
get_pages, get_pages_alloc, npages and alignment will need to behave
differently for "to" and "from" ones) and pull the following trick:
have fault_in_readable return NULL instead of 0, ERR_PTR(-EFAULT) instead
of -EFAULT *and* return a struct page if it was asked for a full-page
range on a page that could be successfully stolen (only "from pipe" iov_iter
would go for the last one, of course).  Then we make generic_perform_write()
shove the return value of fault-in into 'page'.  ->write_begin() is given
&page as an argument, to return the resulting page via that.  All instances
currently just store into that pointer, completely ignoring the prior value.
And they'll keep working just fine.

	Let's make sure that all method call sites outside of
generic_perform_write() (there's only one such, actually) have NULL
stored in there prior to the call.  Now we can start switching the
instances to zero-copy support - all it takes is replacing
grab_cache_page_write_begin() with "if *page is non-NULL, try to
shove it (locked, non-uptodate) into pagecache; if that succeeds grab a
reference to our page and we are done, if it fails - fall back to
grab_cache_page_write_begin()".  Then do get_block, etc., or whatever that
->write_begin() instance would normally do, just remember not to zero anything
if the page had been passed to us by caller.

	Now all we need is to make sure that iov_iter_copy_from_user_atomic()
for those guys recongnizes the case of full-page copy when source and target
are the same page and quietly returns PAGE_SIZE.  Voila - we can make
iter_file_splice_write() pass pipe-backed iov_iter instead of bvec-backed
one *and* get write-side zero-copy for all filesystems with ->write_begin()
taught to handle that (see above).  Since the filesystems with unmodified
->write_begin() will act correctly (just do the copying), we don't have
to make that a flagday change; ->write_begin() instances can be switched
one by one.  Similar treatment of iomap_write_begin()/iomap_write_actor()
would cover iomap-using ->write_iter() instances.

	It's clearly not something I want to touch until -rc1, but it looks
feasible for the next cycle, and if done right it promises to unify the
plain and splice sides of fuse_dev_...() stuff, simplifying the hell out
of them without losing zero-copy there.  And if everything really goes
right, we might be able to get rid of net/* ->splice_read() and ->sendpage()
methods as well...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux