Re: xfs_file_splice_read: possible circular locking dependency detected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 08, 2016 at 07:50:05PM -0700, Linus Torvalds wrote:
> On Thu, Sep 8, 2016 at 7:34 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > IDGI.  Suppose we do splice from file to pipe.  Everything had been in
> > page cache, so we want to end up with pipe_buffers containing references
> > to those page cache pages.  How do you propose to do that without having
> > grabbed references to them?  What's to keep them from being freed by the
> > time we get to reading from the pipe?
> 
> So that's obviously what we already do. That is, after all, why splice
> doesn't actually keep track of "pages", it keeps track of "struct
> pipe_buffer". So each page has not just offset/len associated with it,
> but also a get/release/verify operation block and some flags with them
> (it might not be a page-cache page, so in some cases it might be a skb
> or something that needs different release semantics).
> 
> And if you can make the iterator basically interface with turning the
> page/offset/len directly into a "struct pipe_buffer" and not do any
> extra reference operations, then it actually would work very well.
> 
> But the way I read your description of what you'd do I just expected
> you to have an extra "get/put" ref at the iterator level.

Umm...  Looks like I misunderstood you, then.  Yes, it ends up with
get/get/put, the last two close in time.  Do you expect that to be a serious
overhead?  atomic_inc + atomic_dec_and_test + branch not taken shouldn't be
_that_ hot, and I would rather avoid complicating do_generic_file_read()
and friends with "iov_iter (somehow) told us not to put this page as we
normally would".  Can be done that way, but let's not bother until it really
shows in profiles.

> Maybe I misunderstood. I'd love to see a rough patch.

Cooking it...  The thing I really hate about the use of pipe_buffer is that
we want to keep the "use on-stack array for default pipe size" trick, and
pipe_buffer is fatter than I'd like.  Instead of pointer + two numbers +
something to indicate whether it's picked from page cache or something we'd
allocated we get get pointer + int + int + pointer + int + long, which turns
into 5 words on 64bit.  With 16-element array of those on stack frame, it's
not nice - more than half kilobyte of stack space with ->read_iter() yet to
be called...  bvec would be better (60% savings boils down to 384 bytes
shaved off that thing), but we'd need to play games with encoding the "is
it page cache or not" bit somewhere in it.

BTW, AFAICS that thing can be used to replace _all_ non-default filesystem
instances - lustre, nfs, gfs2, ocfs2, xfs, shmem, even coda.  What remains:
	* fuse_dev_splice_read()
	* relay_file_splice_read()
	* tracing_splice_read_pipe()
	* tracing_buffers_splice_read()
	* sock_splice_read()
TBH, the last one makes me nervious -
        /* Drop the socket lock, otherwise we have reverse
         * locking dependencies between sk_lock and i_mutex
         * here as compared to sendfile(). We enter here
         * with the socket lock held, and splice_to_pipe() will
         * grab the pipe inode lock. For sendfile() emulation,
         * we call into ->sendpage() with the i_mutex lock held
         * and networking will grab the socket lock.
         */
        release_sock(sk);
        ret = splice_to_pipe(pipe, spd);
        lock_sock(sk);
in skb_socket_splice() and
        mutex_unlock(&u->readlock);
        ret = splice_to_pipe(pipe, spd);
        mutex_lock(&u->readlock);
in skb_unix_socket_splice() smell like yet another indication that we are
taking the locks in wrong order.  OTOH, lifting the pipe lock all way out
of that, especially the last one, really smells like asking for deadlocks.
It is a separate issue, but it'll also need looking into...

I wonder if relay_file_read() would be better off converted to ->read_iter()
and unified with relay_file_splice_read() - we do copy_to_user() from
vmap'ed area, but we have the array of underlying struct page *, so it could
switch to copy_page_to_iter(), at which point ->splice_read() would _probably_
be OK with switch to ->read_iter() use.  The tricky part is their use games
with relay_consume_bytes() in relay_pipe_buf_release().  Who maintains that
thing (kernel/relay.c) these days?  git log for it looks like it's been
pretty much abandoned...
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux