Re: [PATCH 1/4] ovl: protect copying of realinode attributes to ovl inode

Amir Goldstein <amir73il@xxxxxxxxx> · Sun, 24 Sep 2023 12:28:49 +0300

[adding some folks from the multigrain ctime discussion]

On Tue, Sep 12, 2023 at 8:36 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>
> ovl_copyattr() may be called concurrently from aio completion context
> without any lock and that could lead to overlay inode attributes getting
> permanently out of sync with real inode attributes.
>
> Similarly, ovl_file_accessed() is always called without any lock to do
> "compare & copy" of mtime/ctime from realinode to inode.
>
> Use ovl inode spinlock to protect those two helpers.
>
> Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
> ---
>  fs/overlayfs/file.c | 2 ++
>  fs/overlayfs/util.c | 2 ++
>  2 files changed, 4 insertions(+)
>
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index 4193633c4c7a..c6ad84cf9246 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -249,6 +249,7 @@ static void ovl_file_accessed(struct file *file)
>         if (!upperinode)
>                 return;
>
> +       spin_lock(&inode->i_lock);
>         ctime = inode_get_ctime(inode);
>         uctime = inode_get_ctime(upperinode);
>         if ((!timespec64_equal(&inode->i_mtime, &upperinode->i_mtime) ||
>              !timespec64_equal(&ctime, &uctime))) {
>                 inode->i_mtime = upperinode->i_mtime;
>                 inode_set_ctime_to_ts(inode, uctime);
>         }
> +       spin_unlock(&inode->i_lock);
>

[patch manually edited to add missing line to context]

Miklos,

I am having latent concerns about this patch, which is currently
in overlayfs-next.

What was your reason for the "compare & copy" optimization?
I assume it was to avoid cache line bouncing. yes?
Would the added spinlock make this optimization moot?
I am asking since I calculated that on X86-64 and with
CONFIG_FS_POSIX_ACL && CONFIG_SECURITY
i_ctime.tv_nsec and i_lock are on the same cache line.

I should note for the non-overlayfs developers, that we do
not care to worry about changes to the upperinode that are
done behind the back of overlayfs.

Ovrerlayfs assumes that it is the only one to make changes
to the upperinode, otherwise, behavior is undefined.

This is the justification for only taking the overlayfs inode
i_lock for synchronization of this "compare & copy" routine.

Also, I think we only need to care that the overlayfs inode
timestamps are eventually consistent, after the last
ovl_file_accessed() call.

The decraled reason (in commit message) for adding the lock
here is to protect from races in the ovl aio code path, which was
not around when the ovl_file_accessed() helper was written.

But now I am wondering:
- Is the lock needed in all the sync calls?
- Is it needed even if there was no aio at all?

I think the answer is yes to both questions, so the patch
can remain in its current form, but I'm not 100% sure.

>         touch_atime(&file->f_path);
>  }
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index 89e0d60d35b6..b7922862ece3 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -1403,6 +1403,7 @@ void ovl_copyattr(struct inode *inode)
>         realinode = ovl_i_path_real(inode, &realpath);
>         real_idmap = mnt_idmap(realpath.mnt);
>
> +       spin_lock(&inode->i_lock);
>         vfsuid = i_uid_into_vfsuid(real_idmap, realinode);
>         vfsgid = i_gid_into_vfsgid(real_idmap, realinode);
>
>         inode->i_uid = vfsuid_into_kuid(vfsuid);
>         inode->i_gid = vfsgid_into_kgid(vfsgid);
>         inode->i_mode = realinode->i_mode;
>         inode->i_mtime = realinode->i_mtime;
>         inode_set_ctime_to_ts(inode, inode_get_ctime(realinode));
>         i_size_write(inode, i_size_read(realinode));
> +       spin_unlock(&inode->i_lock);

My concerns about this part of the patch is that AFAIK,
all the calls of ovl_copyattr(), except for the one in aio completion,
are called with overlayfs inode mutex lock held.

So this lock is strictly only needed because of the aio write case,
but I think we need to have the lock in place for all the other
cases to protect them against racing with aio completion?

I guess the overhead of the spinlock is not a worry if the
mutex is already held, even though we do call ovl_copyattr()
twice (before and after) in some operations.

Anyway, I just want to make sure that I did not make any
mistakes in my analysis of the problem and the fix.

Thanks,
Amir.