On Thu, Sep 08, 2016 at 09:44:50PM +0100, Al Viro wrote: > On Thu, Sep 08, 2016 at 11:12:33AM -0700, Linus Torvalds wrote: > > > So the XFS model *requires* that XFS_IOLOCK to be outside all the operations. > > > > But then in iter_file_splice_write we have the other ordering. > > > > Now, xfs could do a wrapper for ->splice_write() like it used to, and > > have that same "take the xfs lock around the call to > > iter_file_splice_write(). That was my first obvious patch. > > > > I threw it out because that's garbage too: then we end up doing > > ->write_iter(), which takes the xfs_rw_ilock() again, and would > > immediately deadlock *there* instead. > > > > So the problem really is that the vfs layer seems to simply not allow > > the filesystem to do any locking around the generic page cache helper > > functions. And XFS really wants to do that. > > Why *have* ->splice_read() there at all? Let's use its ->read_iter(), where > it will take its lock as it always did for read. > > All we need is a variant of __generic_file_splice_read() that would pass > a new kind of iov_iter down to filesystem's own ->read_iter(). And let that > guy do whatever locking it wants. It will end up doing a sequence of > copy_page_to_iter() and, possibly, copy_to_iter() (XFS one would only do the > former). So let's add an iov_iter flavour that would simply grab a reference > to page passed to copy_page_to_iter() and allocate-and-copy for copy_to_iter(). > As the result, you'll get an array of <page, offset, count> triples - same > as you would from the existing __generic_file_splice_read(). Pages already > uptodate, with all readahead logics done as usual, etc. > > What else do we need? Just feed the resulting triples into the pipe and > that's it. Sure, they can get stale by truncate/punchhole/whatnot. So > they can right after we return from xfs_file_splice_read()... > > Moreover, I don't see why we need to hold pipe lock the actual call of > ->read_iter(). Right now we only grab it for "feed into pipe buffers" > part. Objections? PS: take a look at how much of do_generic_file_read() logics is kinda-sorta open-coded in __generic_file_splice_read(); readahead stuff, etc. It also assumes that filesystem needs no extra locking for playing with pagecache, etc. That's exactly why XFS ends up having to do a wrapper for that sucker and why we get all this headache. Why bother, when we already have a method that turns "read that much from this offset in this file" into a sequence of "take that many bytes from that offset in this page" and "take that many bytes from that buffer"? It doesn't even need to be a ->splice_read() instance - just a function called by do_splice_to() if ->read_iter() is present. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html