Re: [PATCH 1/4] ovl: protect copying of realinode attributes to ovl inode

Amir Goldstein <amir73il@xxxxxxxxx> · Sun, 1 Oct 2023 17:11:12 +0300

On Wed, Sep 27, 2023 at 12:17 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>
> On Sun, Sep 24, 2023 at 12:28 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> >
> > [adding some folks from the multigrain ctime discussion]
> >
> > On Tue, Sep 12, 2023 at 8:36 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > >
> > > ovl_copyattr() may be called concurrently from aio completion context
> > > without any lock and that could lead to overlay inode attributes getting
> > > permanently out of sync with real inode attributes.
> > >
> > > Similarly, ovl_file_accessed() is always called without any lock to do
> > > "compare & copy" of mtime/ctime from realinode to inode.
> > >
> > > Use ovl inode spinlock to protect those two helpers.
> > >
> > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
> > > ---
> > >  fs/overlayfs/file.c | 2 ++
> > >  fs/overlayfs/util.c | 2 ++
> > >  2 files changed, 4 insertions(+)
> > >
> > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > > index 4193633c4c7a..c6ad84cf9246 100644
> > > --- a/fs/overlayfs/file.c
> > > +++ b/fs/overlayfs/file.c
> > > @@ -249,6 +249,7 @@ static void ovl_file_accessed(struct file *file)
> > >         if (!upperinode)
> > >                 return;
> > >
> > > +       spin_lock(&inode->i_lock);
> > >         ctime = inode_get_ctime(inode);
> > >         uctime = inode_get_ctime(upperinode);
> > >         if ((!timespec64_equal(&inode->i_mtime, &upperinode->i_mtime) ||
> > >              !timespec64_equal(&ctime, &uctime))) {
> > >                 inode->i_mtime = upperinode->i_mtime;
> > >                 inode_set_ctime_to_ts(inode, uctime);
> > >         }
> > > +       spin_unlock(&inode->i_lock);
> > >
> >
> > [patch manually edited to add missing line to context]
> >
> > Miklos,
> >
> > I am having latent concerns about this patch, which is currently
> > in overlayfs-next.
> >
> > What was your reason for the "compare & copy" optimization?
> > I assume it was to avoid cache line bouncing. yes?
> > Would the added spinlock make this optimization moot?
> > I am asking since I calculated that on X86-64 and with
> > CONFIG_FS_POSIX_ACL && CONFIG_SECURITY
> > i_ctime.tv_nsec and i_lock are on the same cache line.
> >
> > I should note for the non-overlayfs developers, that we do
> > not care to worry about changes to the upperinode that are
> > done behind the back of overlayfs.
> >
> > Ovrerlayfs assumes that it is the only one to make changes
> > to the upperinode, otherwise, behavior is undefined.
> >
> > This is the justification for only taking the overlayfs inode
> > i_lock for synchronization of this "compare & copy" routine.
> >
> > Also, I think we only need to care that the overlayfs inode
> > timestamps are eventually consistent, after the last
> > ovl_file_accessed() call.
> >
> > The decraled reason (in commit message) for adding the lock
> > here is to protect from races in the ovl aio code path, which was
> > not around when the ovl_file_accessed() helper was written.
> >
> > But now I am wondering:
> > - Is the lock needed in all the sync calls?
> > - Is it needed even if there was no aio at all?
> >
> > I think the answer is yes to both questions, so the patch
> > can remain in its current form, but I'm not 100% sure.
> >
> > >         touch_atime(&file->f_path);
> > >  }
> > > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> > > index 89e0d60d35b6..b7922862ece3 100644
> > > --- a/fs/overlayfs/util.c
> > > +++ b/fs/overlayfs/util.c
> > > @@ -1403,6 +1403,7 @@ void ovl_copyattr(struct inode *inode)
> > >         realinode = ovl_i_path_real(inode, &realpath);
> > >         real_idmap = mnt_idmap(realpath.mnt);
> > >
> > > +       spin_lock(&inode->i_lock);
> > >         vfsuid = i_uid_into_vfsuid(real_idmap, realinode);
> > >         vfsgid = i_gid_into_vfsgid(real_idmap, realinode);
> > >
> > >         inode->i_uid = vfsuid_into_kuid(vfsuid);
> > >         inode->i_gid = vfsgid_into_kgid(vfsgid);
> > >         inode->i_mode = realinode->i_mode;
> > >         inode->i_mtime = realinode->i_mtime;
> > >         inode_set_ctime_to_ts(inode, inode_get_ctime(realinode));
> > >         i_size_write(inode, i_size_read(realinode));
> > > +       spin_unlock(&inode->i_lock);
> >
> > My concerns about this part of the patch is that AFAIK,
> > all the calls of ovl_copyattr(), except for the one in aio completion,
> > are called with overlayfs inode mutex lock held.
> >
> > So this lock is strictly only needed because of the aio write case,
> > but I think we need to have the lock in place for all the other
> > cases to protect them against racing with aio completion?
> >
> > I guess the overhead of the spinlock is not a worry if the
> > mutex is already held, even though we do call ovl_copyattr()
> > twice (before and after) in some operations.
> >
> > Anyway, I just want to make sure that I did not make any
> > mistakes in my analysis of the problem and the fix.
> >
>
> FYI, for now I dropped this patch from overlayfs-next, because
> I was warned that ovl_copyattr() could be called from interrupt
> context in aio completion.
>
> I will bring it back after I sort out the ovl aio completion context.
>

FYI, after applying "ovl: punt write aio completion to workqueue" [1]
to overlayfs-next, I restored the half of this patch that adds
spinlock protection to ovl_copyattr().

But I dropped the half that adds spinlock protection to
ovl_file_accessed(), because I do not want to add a lock to the read
path and because ovl_file_accessed() is not anymore unsafe than
relatime_need_update() already is w.r.t atomic access to struct timespec64
and w.r.t atomic access to mtime/ctime/atime fields, so I guess nobody
really cares so much about being very strict with relatime rules...

Thanks,
Amir.

[1] https://lore.kernel.org/linux-unionfs/20230928064636.487317-1-amir73il@xxxxxxxxx/