Re: [PATCH 1/4] ovl: protect copying of realinode attributes to ovl inode

Amir Goldstein <amir73il@xxxxxxxxx> · Wed, 27 Sep 2023 00:17:27 +0300

On Sun, Sep 24, 2023 at 12:28 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>
> [adding some folks from the multigrain ctime discussion]
>
> On Tue, Sep 12, 2023 at 8:36 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> >
> > ovl_copyattr() may be called concurrently from aio completion context
> > without any lock and that could lead to overlay inode attributes getting
> > permanently out of sync with real inode attributes.
> >
> > Similarly, ovl_file_accessed() is always called without any lock to do
> > "compare & copy" of mtime/ctime from realinode to inode.
> >
> > Use ovl inode spinlock to protect those two helpers.
> >
> > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
> > ---
> >  fs/overlayfs/file.c | 2 ++
> >  fs/overlayfs/util.c | 2 ++
> >  2 files changed, 4 insertions(+)
> >
> > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > index 4193633c4c7a..c6ad84cf9246 100644
> > --- a/fs/overlayfs/file.c
> > +++ b/fs/overlayfs/file.c
> > @@ -249,6 +249,7 @@ static void ovl_file_accessed(struct file *file)
> >         if (!upperinode)
> >                 return;
> >
> > +       spin_lock(&inode->i_lock);
> >         ctime = inode_get_ctime(inode);
> >         uctime = inode_get_ctime(upperinode);
> >         if ((!timespec64_equal(&inode->i_mtime, &upperinode->i_mtime) ||
> >              !timespec64_equal(&ctime, &uctime))) {
> >                 inode->i_mtime = upperinode->i_mtime;
> >                 inode_set_ctime_to_ts(inode, uctime);
> >         }
> > +       spin_unlock(&inode->i_lock);
> >
>
> [patch manually edited to add missing line to context]
>
> Miklos,
>
> I am having latent concerns about this patch, which is currently
> in overlayfs-next.
>
> What was your reason for the "compare & copy" optimization?
> I assume it was to avoid cache line bouncing. yes?
> Would the added spinlock make this optimization moot?
> I am asking since I calculated that on X86-64 and with
> CONFIG_FS_POSIX_ACL && CONFIG_SECURITY
> i_ctime.tv_nsec and i_lock are on the same cache line.
>
> I should note for the non-overlayfs developers, that we do
> not care to worry about changes to the upperinode that are
> done behind the back of overlayfs.
>
> Ovrerlayfs assumes that it is the only one to make changes
> to the upperinode, otherwise, behavior is undefined.
>
> This is the justification for only taking the overlayfs inode
> i_lock for synchronization of this "compare & copy" routine.
>
> Also, I think we only need to care that the overlayfs inode
> timestamps are eventually consistent, after the last
> ovl_file_accessed() call.
>
> The decraled reason (in commit message) for adding the lock
> here is to protect from races in the ovl aio code path, which was
> not around when the ovl_file_accessed() helper was written.
>
> But now I am wondering:
> - Is the lock needed in all the sync calls?
> - Is it needed even if there was no aio at all?
>
> I think the answer is yes to both questions, so the patch
> can remain in its current form, but I'm not 100% sure.
>
> >         touch_atime(&file->f_path);
> >  }
> > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> > index 89e0d60d35b6..b7922862ece3 100644
> > --- a/fs/overlayfs/util.c
> > +++ b/fs/overlayfs/util.c
> > @@ -1403,6 +1403,7 @@ void ovl_copyattr(struct inode *inode)
> >         realinode = ovl_i_path_real(inode, &realpath);
> >         real_idmap = mnt_idmap(realpath.mnt);
> >
> > +       spin_lock(&inode->i_lock);
> >         vfsuid = i_uid_into_vfsuid(real_idmap, realinode);
> >         vfsgid = i_gid_into_vfsgid(real_idmap, realinode);
> >
> >         inode->i_uid = vfsuid_into_kuid(vfsuid);
> >         inode->i_gid = vfsgid_into_kgid(vfsgid);
> >         inode->i_mode = realinode->i_mode;
> >         inode->i_mtime = realinode->i_mtime;
> >         inode_set_ctime_to_ts(inode, inode_get_ctime(realinode));
> >         i_size_write(inode, i_size_read(realinode));
> > +       spin_unlock(&inode->i_lock);
>
> My concerns about this part of the patch is that AFAIK,
> all the calls of ovl_copyattr(), except for the one in aio completion,
> are called with overlayfs inode mutex lock held.
>
> So this lock is strictly only needed because of the aio write case,
> but I think we need to have the lock in place for all the other
> cases to protect them against racing with aio completion?
>
> I guess the overhead of the spinlock is not a worry if the
> mutex is already held, even though we do call ovl_copyattr()
> twice (before and after) in some operations.
>
> Anyway, I just want to make sure that I did not make any
> mistakes in my analysis of the problem and the fix.
>

FYI, for now I dropped this patch from overlayfs-next, because
I was warned that ovl_copyattr() could be called from interrupt
context in aio completion.

I will bring it back after I sort out the ovl aio completion context.

Thanks,
Amir.