On Sat, Sep 03, 2016 at 02:45:14AM +0100, Al Viro wrote: > On Fri, Sep 02, 2016 at 05:57:04PM -0700, Linus Torvalds wrote: > > On Fri, Sep 2, 2016 at 5:39 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > > > Fundamentally a splice infrastructure problem. > > > > Yeah, I don't really like how we handle the pipe lock. > > > > It *might* be possible to instead just increment the reference > > counters as we build a kvec[] array of them, and simply do teh write > > without holding the pipe lock at all. > > > > That has other problems, ie concurrect spices from the same pipe would > > possibly write the same data multiple times, though. > > > > But yes, the fundamental problem is how splice wants to take the pipe > > lock both early and late. Very annoying. > > We could, in principle, add another flavour of iov_iter, with bvec > array attached to it with copy_page_to_iter() sticking an extra ref to that > page into array. Then, under pipe lock, feed that thing to ->read_iter() > and do an equivalent of splice_to_pipe() that would take bvec array instead > of struct page */struct partial_page arrays. Not sure I quite follow - where do the pages come from? Do we allocate new pages that get put into the bvec, then run the read which copies data from the page cache page into them, then hand those pages in the bvec to the pipe? ISTR this read->splice_to_pipe path was once supposed to be a zero-copy path - doesn't this make zero-copy impossible? Or was the zero-copy splice read path done through some other path I've forgotten about? > Hell, we could even have copy_to_iter() for these puppies allocate a page, > stick it into the next bvec and copy into it. Especially if we have those > bvec zeroed, with copy_page_to_iter() leaving ->bvec pointing to the next > (unused) bvec and copy_to_iter() doing that only when a page had been > completely filled. I.e. This has the same "data copy in the splice read path" as the above interface. However, I suspect that this interface could actually be used for zero copy (by stealing pages from the page cache rather than allocating new pages and copying), so it may be a better way to proceed... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs