Re: [RFC][PATCH 4/4] ovl: allow concurrent copy up data of different files

Amir Goldstein <amir73il@xxxxxxxxx> · Wed, 16 Nov 2016 07:27:12 +0200

On Tue, Nov 15, 2016 at 11:57 PM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> On Tue, Nov 15, 2016 at 09:20:32PM +0200, Amir Goldstein wrote:
>> On Tue, Nov 15, 2016 at 5:56 PM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
>> > On Sat, Nov 12, 2016 at 09:36:04PM +0200, Amir Goldstein wrote:
>> >> Currently, all copy operations are serialized by taking the
>> >> lock_rename(workdir, upperdir) locks for the entire copy up
>> >> operation, including the data copy up.
>> >>
>> >> Copying up data may take a long time with large files and
>> >> holding s_vfs_rename_mutex during that time blocks all
>> >> rename and copy up operations on the entire underlying fs.
>> >>
>> >> This change addresses this problem by changing the copy up
>> >> locking scheme for different types of files as follows.
>> >>
>> >> Directories:
>> >>   <maybe> inode_lock(ovl_inode)
>> >
>> > Hi Amir,
>> >
>> > What does <maybe> mean here? Is it optional?
>> >
>>
>>
>> It means that some execution paths inode_lock(ovl_inode) is taken (e.g. rmdir()
>> of overlay dir) and in some execution paths it is not taken (e.g. when
>> copying up
>> the victim inodes' parents).
>>
>> What I have not explain properly is that my change does not add any new
>> inode_lock(ovl_inode) calls for directories and special files - it is taken in
>> VFS before getting to overlay code.
>> I listed the inode_lock(ovl_inode) in the locking scheme of directories and
>> special files to show that is safe (locking order wise) to take the same lock
>> inside overlay code, for regular files on open.
>>
>
> Ok, got it. Only in case of regular files (non-zero size), we have added
> the inode_lock(ovl_inode) and that's required because we will be dropping
> lock_rename() locks temporarily for data copy and want to make sure
> another thread does not trigger a parallel copy up of same file.
>

Well, to put it more accurately:

Only in case of regular files open for write and truncate(), we have added
the inode_lock(ovl_inode).
The inode_lock(ovl_inode) is required for open for write (no O_TRUNC)
of regular files (non-zero size), because we will be dropping
lock_rename() locks temporarily for data copy and want to make sure
another thread does not trigger a parallel copy up of same file.

So we are taking inode_lock(ovl_inode) for truncate() and open of
zero sized files for no other reason then keeping the code simpler
and being able to declare unconditionally above ovl_copy_up{,_open}():
/* Called with dentry->d_inode lock held */

>> >>     lock_rename(workdir, upperdir)
>> >>       copy up dir to workdir
>> >>       move dir to upperdir
>> >>
>> >> Special and zero size files:
>> >>   inode_lock(ovl_inode)
>> >>     lock_rename(workdir, upperdir)
>> >>       copy up file to workdir (no data)
>> >>       move file to upperdir
>> >>
>> >
>> > If we are copying up directories and special and zero file under
>> > lock_rename() and don't drop it during the whole operation, then
>> > why do we need to take ovl_inode lock.
>> >
>>
>> So we really don't take it, but for all the call sites of ovl_copy_up()
>> except for the ovl_d_real() for regular files open, the ovl_inode lock is
>> already taken in VFS.
>>
>> > IOW, current code seems to be just doing lock_rename(workdir, upperdir)
>> > for the copy up operation. But now this new code also requires
>> > inode_lock(ovl_inode) and I am trying to understand why.
>> >
>>
>> So inode_lock(ovl_inode) is now taken ALSO in the only path it was not
>> already taken until now. And the reason that we take it is so we can release
>> lock_rename() for the duration of copy up data.
>>
>> Hope I was able to clarify myself.
>>
>> Amir.
>>
>> > Thanks
>> > Vivek
>> >
>> >> Regular files with data:
>> >>   inode_lock(ovl_inode)
>> >>     lock_rename(workdir, upperdir)
>> >>       copy up file to workdir (no data)
>> >>     unlock_rename(workdir, upperdir)
>> >>       copy data to file in workdir
>> >>     lock_rename(workdir, upperdir)
>> >>       move file to upperdir
>> >>
>> >> Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
>> >> ---
>> >>  fs/overlayfs/copy_up.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++---
>> >>  fs/overlayfs/inode.c   |  3 +++
>> >>  2 files changed, 69 insertions(+), 4 deletions(-)
>> >>
>> >> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
>> >> index a16127b..1b9705e 100644
>> >> --- a/fs/overlayfs/copy_up.c
>> >> +++ b/fs/overlayfs/copy_up.c
>> >> @@ -230,6 +230,44 @@ int ovl_set_attr(struct dentry *upperdentry, struct kstat *stat)
>> >>       return err;
>> >>  }
>> >>
>> >> +/*
>> >> + * Called with dentry->d_inode lock held only for the last (leaf) copy up
>> >> + * from __ovl_copy_up(), so it is NOT held when called for ancestor
>> >> + * directory from __ovl_copy_up()
>> >> + *
>> >> + * Called with lock_rename(workdir, upperdir) locks held.
>> >> + *
>> >> + * lock_rename() locks remain locked throughout the copy up
>> >> + * of non regular files and zero sized regular files.
>> >> + *
>> >> + * lock_rename() locks are released during ovl_copy_up_data()
>> >> + * of non zero sized regular files. During this time, the overlay
>> >> + * dentry->d_inode lock is still held to avoid concurrent
>> >> + * copy up of files with data.
>> >> + *
>> >> + * Maybe a better description of this locking scheme:
>> >> + *
>> >> + * Directories:
>> >> + *   <maybe> inode_lock(ovl_inode)
>> >> + *     lock_rename(workdir, upperdir)
>> >> + *       copy up dir to workdir
>> >> + *       move dir to upperdir
>> >> + *
>> >> + * Special and zero size files:
>> >> + *   inode_lock(ovl_inode)
>> >> + *     lock_rename(workdir, upperdir)
>> >> + *       copy up file to workdir (no data)
>> >> + *       move file to upperdir
>> >> + *
>> >> + * Regular files with data:
>> >> + *  inode_lock(ovl_inode)
>> >> + *    lock_rename(workdir, upperdir)
>> >> + *      copy up file to workdir (no data)
>> >> + *    unlock_rename(workdir, upperdir)
>> >> + *      copy data to file in workdir
>> >> + *    lock_rename(workdir, upperdir)
>> >> + *      move file to upperdir
>> >> + */
>> >>  static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
>> >>                             struct dentry *dentry, struct path *lowerpath,
>> >>                             struct kstat *stat, const char *link)
>> >> @@ -274,16 +312,39 @@ static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir,
>> >>       if (err)
>> >>               goto out2;
>> >>
>> >> -     if (S_ISREG(stat->mode)) {
>> >> +     if (S_ISREG(stat->mode) && stat->size > 0) {
>> >>               struct path upperpath;
>> >>
>> >>               ovl_path_upper(dentry, &upperpath);
>> >>               BUG_ON(upperpath.dentry != NULL);
>> >>               upperpath.dentry = newdentry;
>> >>
>> >> +             /*
>> >> +              * Release rename locks, because copy data may take a long time,
>> >> +              * and holding s_vfs_rename_mutex will block all rename and
>> >> +              * copy up operations on the entire underlying fs.
>> >> +              * We still hold the overlay inode lock to avoid concurrent
>> >> +              * copy up of this file.
>> >> +              */
>> >> +             unlock_rename(workdir, upperdir);
>> >> +
>> >>               err = ovl_copy_up_data(lowerpath, &upperpath, stat->size);
>> >> +
>> >> +             /*
>> >> +              * Re-aquire rename locks, before moving copied file into place.
>> >> +              */
>> >> +             if (unlikely(lock_rename(workdir, upperdir) != NULL)) {
>> >> +                     pr_err("overlayfs: failed to re-aquire lock_rename\n");
>> >> +                     err = -EIO;
>> >> +             }
>> >> +
>> >>               if (err)
>> >>                       goto out_cleanup;
>> >> +
>> >> +             if (WARN_ON(ovl_dentry_is_upper(dentry))) {
>> >> +                     /* Raced with another copy-up? This shouldn't happen */
>> >> +                     goto out_cleanup;
>> >> +             }
>> >>       }
>> >>
>> >>       err = ovl_copy_xattr(lowerpath->dentry, newdentry);
>> >> @@ -366,15 +427,14 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
>> >>                       return PTR_ERR(link);
>> >>       }
>> >>
>> >> -     err = -EIO;
>> >> -     if (lock_rename(workdir, upperdir) != NULL) {
>> >> +     if (unlikely(lock_rename(workdir, upperdir) != NULL)) {
>> >>               pr_err("overlayfs: failed to lock workdir+upperdir\n");
>> >> +             err = -EIO;
>> >>               goto out_unlock;
>> >>       }
>> >>
>> >>       if (ovl_dentry_is_upper(dentry)) {
>> >>               /* Raced with another copy-up?  Nothing to do, then... */
>> >> -             err = 0;
>> >>               goto out_unlock;
>> >>       }
>> >>
>> >> @@ -433,11 +493,13 @@ static int __ovl_copy_up(struct dentry *dentry, int flags)
>> >>       return err;
>> >>  }
>> >>
>> >> +/* Called with dentry->d_inode lock held */
>> >>  int ovl_copy_up(struct dentry *dentry)
>> >>  {
>> >>       return __ovl_copy_up(dentry, 0);
>> >>  }
>> >>
>> >> +/* Called with dentry->d_inode lock held */
>> >>  int ovl_copy_up_open(struct dentry *dentry, int flags)
>> >>  {
>> >>       return __ovl_copy_up(dentry, flags);
>> >> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
>> >> index 7abae00..532b0d5 100644
>> >> --- a/fs/overlayfs/inode.c
>> >> +++ b/fs/overlayfs/inode.c
>> >> @@ -251,11 +251,14 @@ int ovl_open_maybe_copy_up(struct dentry *dentry, unsigned int file_flags)
>> >>
>> >>       type = ovl_path_real(dentry, &realpath);
>> >>       if (ovl_open_need_copy_up(file_flags, type, realpath.dentry)) {
>> >> +             /* Take the overlay inode lock to avoid concurrent copy-up */
>> >> +             inode_lock(dentry->d_inode);
>> >>               err = ovl_want_write(dentry);
>> >>               if (!err) {
>> >>                       err = ovl_copy_up_open(dentry, file_flags);
>> >>                       ovl_drop_write(dentry);
>> >>               }
>> >> +             inode_unlock(dentry->d_inode);
>> >>       }
>> >>
>> >>       return err;
>> >> --
>> >> 2.7.4
>> >>
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
>> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html