Re: [PATCH 0/5] splice: locking changes and code refactoring

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 14, 2014 at 05:20:33PM +0000, Al Viro wrote:
> On Tue, Jan 14, 2014 at 05:22:07AM -0800, Christoph Hellwig wrote:
> > On Mon, Jan 13, 2014 at 11:56:46PM +0000, Al Viro wrote:
> > > On Mon, Jan 13, 2014 at 06:14:16AM -0800, Christoph Hellwig wrote:
> > > > ping?  Would be nice to get this into 3.14
> > > 
> > > Umm...  The reason for pipe_lock outside of ->i_mutex is this:
> > > default_file_splice_write() calls splice_from_pipe() with
> > > write_pipe_buf for callback.  splice_from_pipe() calls that
> > > callback under pipe_lock(pipe).  And write_pipe_buf() calls
> > > __kernel_write(), which certainly might want to take ->i_mutex.
> > > 
> > > Now, this codepath isn't taken for files that have non-NULL
> > > ->splice_write(), so that's not an issue for XFS and OCFS2,
> > > but having pipe_lock nest between the ->i_mutex for filesystems
> > > that do and do not have ->splice_write()...  Ouch...
> > 
> > What would be the alternative?  Duplicating the code in even more
> > filesystems to enforce an non-natural locking order for filesystems
> > actually implementing splice?  There don't actually seem to be a whole
> > lot of real filesystems not implemting splice_write, the prime use
> > would be for device drivers or synthetic ones.  I'm not even sure
> > how much that fallback gets used in practice.

Hmm...  In principle, the following would be no worse than what
generic_file_splice_write() is doing: confirm and map the pages, build
an iovec and use ->aio_write() to write it out, then unmap the suckers,
release ones entirely written to file and adjust the partially
written one.  All under pipe_lock().  Hell, if we introduce
kernel_writev() (either by calling vfs_writev() or taking do_readv_writev()
sans copying iovec and using that under set_fs()), we could switch
default_file_splice_write() to that and get rid of ->splice_write() for
the majority of filesystems, if not all of them.

Sure, it means copying from pipe buffers to pagecache, but we have
generic_file_splice_write() do that copy anyway - conditional memcpy()
in pipe_to_file() is actually unconditional; that if (page != buf->page) in
there had just been forgotten by Nick back in 2007 ("1/2 splice: dont steal").

Objections, comments?

The problem Christoph was talking about is that generic_file_splice_write()
plays with ->i_mutex and both gets/drops it for each page of IO *and*
causes PITA for any fs that wants some locks of its own taken in addition
to ->i_mutex on the write paths.  What ->splice_write() without page
stealing is doing is pretty much a writev() from array of pages in kernel
space; so it looks like we might as well just reuse writev() guts for that...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux