On Thu, Oct 12, 2006 at 12:54:09PM -0700, Andrew Morton wrote: > > Shouldn't we be taking this before calling into ->prepare_write() and > > ->commit_write(). What's preventing generic_file_splice_write() from racing > > a truncate? Or maybe even another write? > > The lock_page() will block truncate and will block write()s to this particular > page. Ok. > > A quick look through other callers reveals that generic_file_aio_write() and > > do_lo_send_aops() both are careful to take i_mutex. > > I'm trying to remember what i_mutex actually protects in this context. > i_size, certainly - if we go changing the file size without locks then > other places might get surprised. For example, a concurrent write() at a > larger file offset might try to increase i_size but if it loses the race > against the unlocked i_size-changing thread, the inode ends up with the > smaller i_size. I'm also worried about concurrent allocation tree changes. Perhaps I'm mistaken and all file systems we care about can handle them happening concurrently, but otherwise couldn't two processes writing to different sparse regions in a file cause problems? One process via file write and the other via a splice write. > So yup, we need i_mutex if only for that reason. Ok. Here's a first pass. The double lock is ugly, but as far as I can tell we need it. Unless there's a rule about ordering between pipe inodes and "other" inodes that I don't know about. Compile tested only. I probably won't get a chance to actually run it until late this weekend at the earliest :/ --Mark -- Mark Fasheh Senior Software Developer, Oracle mark.fasheh@xxxxxxxxxx From: Mark Fasheh <mark.fasheh@xxxxxxxxxx> [PATCH] Take i_mutex in splice_from_pipe() The splice_actor may be calling ->prepare_write() and ->commit_write(). We want i_mutex on the inode being written to before calling those so that we don't race i_size changes. Signed-off-by: Mark Fasheh <mark.fasheh@xxxxxxxxxx> --- fs/splice.c | 21 +++++++++++++++++++-- 1 files changed, 19 insertions(+), 2 deletions(-) 96e880f22bd1c0e2809ebbfe5bf122ed67019e33 diff --git a/fs/splice.c b/fs/splice.c index 13e92dd..e1ecb9e 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -713,6 +713,7 @@ ssize_t splice_from_pipe(struct pipe_ino { int ret, do_wakeup, err; struct splice_desc sd; + struct inode *inode = out->f_mapping->host; ret = 0; do_wakeup = 0; @@ -722,8 +723,23 @@ ssize_t splice_from_pipe(struct pipe_ino sd.file = out; sd.pos = *ppos; - if (pipe->inode) - mutex_lock(&pipe->inode->i_mutex); + /* + * The actor worker might be calling ->prepare_write and + * ->commit_write. Most of the time, these expect i_mutex to + * be held. Since this may result in an ABBA deadlock with + * pipe->inode, we have to order lock acquiry here. + */ + if (pipe->inode) { + if (pipe->inode < inode) { + mutex_lock_nested(&pipe->inode->i_mutex, I_MUTEX_PARENT); + mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD); + } else { + mutex_lock_nested(&inode->i_mutex, I_MUTEX_PARENT); + mutex_lock_nested(&pipe->inode->i_mutex, I_MUTEX_CHILD); + } + } else { + mutex_lock(&inode->i_mutex); + } for (;;) { if (pipe->nrbufs) { @@ -799,6 +815,7 @@ ssize_t splice_from_pipe(struct pipe_ino if (pipe->inode) mutex_unlock(&pipe->inode->i_mutex); + mutex_unlock(&inode->i_mutex); if (do_wakeup) { smp_mb(); -- 1.3.3 - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html