On Wed, Sep 14, 2016 at 3:43 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > Use vfs_copy_file_range() helper instead of calling do_splice_direct() > when copying up file data. > When copying up within the same fs, which supports copy_file_range(), > fs implementation can be more efficient then do_splice_direct(). > vfs_copy_file_range() helper falls back to do_splice_direct() if > it cannot use the file system's copy_file_range() implementation. > > A previous change added a vfs_clone_file_range() call before > the data copy loop, so this change is only effective for filesystems > that support copy_file_range() and *do not* support clone_file_range(). > At the moment, there are no such filesystems in the kernel that > can be used as overlayfs upper, so I tested this change by disabling > the vfs_clone_file_range() call. > > Tested correct behavior when lower and upper are on: > 1. same ext4 (copy) > 2. same xfs + reflink patches + mkfs.xfs (copy) > 3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (reflink) > 4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy) > > For comparison, on my laptop, xfstest overlay/001 (copy up of large > sparse files) takes less than 1 second in the xfs reflink setup vs. > 25 seconds on the rest of the setups. > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > --- > fs/overlayfs/copy_up.c | 10 +++++----- > fs/read_write.c | 1 + > 2 files changed, 6 insertions(+), 5 deletions(-) > > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c > index ba039f8..a6d6bac 100644 > --- a/fs/overlayfs/copy_up.c > +++ b/fs/overlayfs/copy_up.c > @@ -146,7 +146,6 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len) > /* Can't clone, so now we try to copy the data */ > error = 0; > > - /* FIXME: copy up sparse files efficiently */ > while (len) { > size_t this_len = OVL_COPY_UP_CHUNK_SIZE; > long bytes; > @@ -159,15 +158,16 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len) > break; > } > > - bytes = do_splice_direct(old_file, &old_pos, > - new_file, &new_pos, > - this_len, SPLICE_F_MOVE); > + bytes = vfs_copy_file_range(old_file, old_pos, > + new_file, new_pos, > + this_len, 0); > if (bytes <= 0) { > error = bytes; > break; > } > - WARN_ON(old_pos != new_pos); > > + old_pos += bytes; > + new_pos += bytes; > len -= bytes; > } > out: > diff --git a/fs/read_write.c b/fs/read_write.c > index 6975fe8..dfc083a 100644 > --- a/fs/read_write.c > +++ b/fs/read_write.c > @@ -1515,6 +1515,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > file_out->f_op->copy_file_range) > ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out, > pos_out, len, flags); > + /* FIXME: copy sparse file range efficiently */ > if (ret == -EOPNOTSUPP) > ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out, > len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0); > -- > 2.7.4 > Guys, I just hit a lockdep warning on nesting an mnt_want_write() with this patch because overlay already holds the write lock when getting to ovl_copy_up_data() I don't know how I missed it before. Is it really a problem to nest this lock? Should I factor our another set of _locked helpers from the vfs_{copy,clone}_file_range helpers? =========== [ 675.814754] ============================================= [ 675.814755] [ INFO: possible recursive locking detected ] [ 675.814757] 4.8.0-rc5+ #5 Tainted: G W O [ 675.814758] --------------------------------------------- [ 675.814759] xfs_io/15241 is trying to acquire lock: [ 675.814760] (sb_writers#7){.+.+.+}, at: [<ffffffffa2277144>] __sb_start_write+0xb4/0xf0 [ 675.814767] but task is already holding lock: [ 675.814768] (sb_writers#7){.+.+.+}, at: [<ffffffffa2277144>] __sb_start_write+0xb4/0xf0 [ 675.814772] other info that might help us debug this: [ 675.814773] Possible unsafe locking scenario: [ 675.814774] CPU0 [ 675.814775] ---- [ 675.814775] lock(sb_writers#7); [ 675.814777] lock(sb_writers#7); [ 675.814779] *** DEADLOCK *** [ 675.814780] May be due to missing lock nesting notation [ 675.814782] 4 locks held by xfs_io/15241: [ 675.814783] #0: (sb_writers#7){.+.+.+}, at: [<ffffffffa2277144>] __sb_start_write+0xb4/0xf0 [ 675.814787] #1: (&type->s_vfs_rename_key){+.+.+.}, at: [<ffffffffa227f992>] lock_rename+0x32/0x100 [ 675.814791] #2: (&type->i_mutex_dir_key#2/1){+.+.+.}, at: [<ffffffffa227fa40>] lock_rename+0xe0/0x100 [ 675.814796] #3: (&type->i_mutex_dir_key#2/5){+.+.+.}, at: [<ffffffffa227fa55>] lock_rename+0xf5/0x100 [ 675.814800] stack backtrace: [ 675.814802] CPU: 2 PID: 15241 Comm: xfs_io Tainted: G W O 4.8.0-rc5+ #5 [ 675.814804] Hardware name: Dell Inc. Latitude E7450/0D8H72, BIOS A13 05/17/2016 [ 675.814805] 0000000000000086 00000000029bfc05 ffff9cd454ab78b0 ffffffffa2474d83 [ 675.814808] ffffffffa3a7b6f0 ffff9cd4685f0000 ffff9cd454ab7980 ffffffffa20ed54d [ 675.814810] 0000000000000296 0000000400000246 ffff9cd400000000 ffffffffa3392101 [ 675.814813] Call Trace: [ 675.814816] [<ffffffffa2474d83>] dump_stack+0x85/0xc2 [ 675.814820] [<ffffffffa20ed54d>] __lock_acquire+0x148d/0x14f0 [ 675.814822] [<ffffffffa22ec170>] ? dquot_file_open+0x40/0x50 [ 675.814825] [<ffffffffa20eda70>] lock_acquire+0x100/0x1f0 [ 675.814826] [<ffffffffa2277144>] ? __sb_start_write+0xb4/0xf0 [ 675.814829] [<ffffffffa20e6b1f>] percpu_down_read+0x4f/0xa0 [ 675.814830] [<ffffffffa2277144>] ? __sb_start_write+0xb4/0xf0 [ 675.814832] [<ffffffffa2277144>] __sb_start_write+0xb4/0xf0 [ 675.814834] [<ffffffffa23cdd2d>] ? security_file_permission+0x3d/0xc0 [ 675.814837] [<ffffffffa229ba88>] mnt_want_write_file+0x28/0x60 [ 675.814839] [<ffffffffa2274dd2>] vfs_copy_file_range+0xc2/0x270 [ 675.814844] [<ffffffffc0c797f6>] ovl_copy_up_one+0x4d6/0x710 [overlay] [ 675.814847] [<ffffffffc0c79b16>] ovl_copy_up+0xe6/0x118 [overlay] [ 675.814850] [<ffffffffc0c75d6d>] ovl_open_maybe_copy_up+0x8d/0xd0 [overlay] [ 675.814852] [<ffffffffc0c73303>] ovl_d_real+0xd3/0x130 [overlay] [ 675.814854] [<ffffffffa227209f>] vfs_open+0x6f/0x80 [ 675.814856] [<ffffffffa2284698>] path_openat+0x2a8/0xb80 [ 675.814858] [<ffffffffa22864e1>] do_filp_open+0x91/0x100 [ 675.814861] [<ffffffffa292abe7>] ? _raw_spin_unlock+0x27/0x40 [ 675.814862] [<ffffffffa2297f39>] ? __alloc_fd+0xf9/0x210 [ 675.814864] [<ffffffffa2272494>] do_sys_open+0x124/0x210 [ 675.814867] [<ffffffffa227259e>] SyS_open+0x1e/0x20 [ 675.814868] [<ffffffffa292b540>] entry_SYSCALL_64_fastpath+0x23/0xc1 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html