On Fri, Sep 02, 2016 at 05:57:04PM -0700, Linus Torvalds wrote: > On Fri, Sep 2, 2016 at 5:39 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > Fundamentally a splice infrastructure problem. > > Yeah, I don't really like how we handle the pipe lock. > > It *might* be possible to instead just increment the reference > counters as we build a kvec[] array of them, and simply do teh write > without holding the pipe lock at all. > > That has other problems, ie concurrect spices from the same pipe would > possibly write the same data multiple times, though. > > But yes, the fundamental problem is how splice wants to take the pipe > lock both early and late. Very annoying. We could, in principle, add another flavour of iov_iter, with bvec array attached to it with copy_page_to_iter() sticking an extra ref to that page into array. Then, under pipe lock, feed that thing to ->read_iter() and do an equivalent of splice_to_pipe() that would take bvec array instead of struct page */struct partial_page arrays. Hell, we could even have copy_to_iter() for these puppies allocate a page, stick it into the next bvec and copy into it. Especially if we have those bvec zeroed, with copy_page_to_iter() leaving ->bvec pointing to the next (unused) bvec and copy_to_iter() doing that only when a page had been completely filled. I.e. copy_page_to_iter() if (!iter->nr_segs) return 0; if (iter->bvec->bv_page) { iter->bvec++; if (!--iter->nr_segs) return 0; } stick (page,offset,bytes) into iter->bvec iter->bvec++; iter->nr_segs--; return bytes; copy_to_iter(): wanted = bytes; while (bytes && iter->nr_segs) { if (!iter->bvec->bv_page) iter->bvec->bv_page = alloc_page() n = min(PAGE_SIZE - iter->bvec->bv_len, bytes); memcpy_to_page(addr, iter->bvec->bv_len, n); bytes -= n; iter->bvec->bv_len += n; if (iter->bvec->bv_len == PAGE_SIZE) { iter->bvec++; iter->nr_segs--; } } return wanted - bytes; That should suffice for quite a few of read_iter-using file_operations, if not for all of them. pipe lock is on the outside, same as for write side *and* for default_file_splice_read(). And filesystem gets the same locking it would for read(2)/readv(2)/etc... Comments? -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html