Re: [PATCH v4] ovl: improving copy-up efficiency for big sparse file

Amir Goldstein <amir73il@xxxxxxxxx> · Mon, 16 Dec 2019 16:03:12 +0200

On Mon, Dec 16, 2019 at 1:58 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
>
> On Thu, Dec 12, 2019 at 4:43 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>
> > It's the same old story that was fixed in commit:
> > 6d0a8a90a5bb ovl: take lower dir inode mutex outside upper sb_writers lock
> >
> > The lower overlay inode mutex is taken inside ovl_llseek() while upper fs
> > sb_writers is held since ovl_maybe_copy_up() of nested overlay.
> >
> > Since the lower overlay uses same real fs as nested overlay upper,
> > this could really deadlock if the lower overlay inode is being modified
> > (took inode mutex and trying to take real fs sb_writers).
> >
> > Not a very common case, but still a possible deadlock.
> >
> > The only way to avoid this deadlock is probably a bit too hacky for your taste:
> >
> >         /* Skip copy hole optimization for nested overlay */
> >         if (old->mnt->mnt_sb->s_stack_depth)
> >                 skip_hole = false;
> >
> > The other way is to use ovl_inode_lock() in ovl_llseek().
> >
> > Have any preference? Something else?
> >
> > Should we maybe use ovl_inode_lock() also in ovl_write_iter() and
> > ovl_ioctl_set_flags()? In all those cases, we are not protecting the overlay
> > inode members, but the real inode members from concurrent modification
> > through overlay.
>
> Possibly.   I think this whole thing needs a good analysis of i_rwsem
> use in overlayfs.
>

>From what I can find, besides those 3 instances in file.c, there are
two places I know of that access vfs_ functions on the overlay inode:

1. ovl_lookup_real_one(connected, ...), which takes the inode_lock itself
2. ovl_cache_update_ino(path, ...), which is called with inode_lock held

In those places the locking is intentional and looks correct.

And 3 more places that take inode_lock of overlay dir inode:
1. ovl_dir_llseek() - synchronize modifications to od->cache
    and synchronize modifications to real f_pos
2. ovl_dir_fsync() - synchronize modifications to od->upperfile.
3. ovl_dir_release() - synchronize modifications to od->cache.

Those 3 places were written before ovl_inode_lock existed.

real f_pos could be protected by ovl_inode_lock same as ovl_llseek().

od->upperfile could be protected by ovl_inode_lock same as copy up.

od->cache is created/accessed from ovl_iterate() with inode_lock
protection from vfs - I don't know if we want to change that to also take
ovl_inode_lock, so not sure if we have a good reason to change locking
in ovl dir operations.

Thanks,
Amir.