On Wed, Sep 27, 2023 at 12:17 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > On Sun, Sep 24, 2023 at 12:28 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > [adding some folks from the multigrain ctime discussion] > > > > On Tue, Sep 12, 2023 at 8:36 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > ovl_copyattr() may be called concurrently from aio completion context > > > without any lock and that could lead to overlay inode attributes getting > > > permanently out of sync with real inode attributes. > > > > > > Similarly, ovl_file_accessed() is always called without any lock to do > > > "compare & copy" of mtime/ctime from realinode to inode. > > > > > > Use ovl inode spinlock to protect those two helpers. > > > > > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > > > --- > > > fs/overlayfs/file.c | 2 ++ > > > fs/overlayfs/util.c | 2 ++ > > > 2 files changed, 4 insertions(+) > > > > > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c > > > index 4193633c4c7a..c6ad84cf9246 100644 > > > --- a/fs/overlayfs/file.c > > > +++ b/fs/overlayfs/file.c > > > @@ -249,6 +249,7 @@ static void ovl_file_accessed(struct file *file) > > > if (!upperinode) > > > return; > > > > > > + spin_lock(&inode->i_lock); > > > ctime = inode_get_ctime(inode); > > > uctime = inode_get_ctime(upperinode); > > > if ((!timespec64_equal(&inode->i_mtime, &upperinode->i_mtime) || > > > !timespec64_equal(&ctime, &uctime))) { > > > inode->i_mtime = upperinode->i_mtime; > > > inode_set_ctime_to_ts(inode, uctime); > > > } > > > + spin_unlock(&inode->i_lock); > > > > > > > [patch manually edited to add missing line to context] > > > > Miklos, > > > > I am having latent concerns about this patch, which is currently > > in overlayfs-next. > > > > What was your reason for the "compare & copy" optimization? > > I assume it was to avoid cache line bouncing. yes? > > Would the added spinlock make this optimization moot? > > I am asking since I calculated that on X86-64 and with > > CONFIG_FS_POSIX_ACL && CONFIG_SECURITY > > i_ctime.tv_nsec and i_lock are on the same cache line. > > > > I should note for the non-overlayfs developers, that we do > > not care to worry about changes to the upperinode that are > > done behind the back of overlayfs. > > > > Ovrerlayfs assumes that it is the only one to make changes > > to the upperinode, otherwise, behavior is undefined. > > > > This is the justification for only taking the overlayfs inode > > i_lock for synchronization of this "compare & copy" routine. > > > > Also, I think we only need to care that the overlayfs inode > > timestamps are eventually consistent, after the last > > ovl_file_accessed() call. > > > > The decraled reason (in commit message) for adding the lock > > here is to protect from races in the ovl aio code path, which was > > not around when the ovl_file_accessed() helper was written. > > > > But now I am wondering: > > - Is the lock needed in all the sync calls? > > - Is it needed even if there was no aio at all? > > > > I think the answer is yes to both questions, so the patch > > can remain in its current form, but I'm not 100% sure. > > > > > touch_atime(&file->f_path); > > > } > > > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c > > > index 89e0d60d35b6..b7922862ece3 100644 > > > --- a/fs/overlayfs/util.c > > > +++ b/fs/overlayfs/util.c > > > @@ -1403,6 +1403,7 @@ void ovl_copyattr(struct inode *inode) > > > realinode = ovl_i_path_real(inode, &realpath); > > > real_idmap = mnt_idmap(realpath.mnt); > > > > > > + spin_lock(&inode->i_lock); > > > vfsuid = i_uid_into_vfsuid(real_idmap, realinode); > > > vfsgid = i_gid_into_vfsgid(real_idmap, realinode); > > > > > > inode->i_uid = vfsuid_into_kuid(vfsuid); > > > inode->i_gid = vfsgid_into_kgid(vfsgid); > > > inode->i_mode = realinode->i_mode; > > > inode->i_mtime = realinode->i_mtime; > > > inode_set_ctime_to_ts(inode, inode_get_ctime(realinode)); > > > i_size_write(inode, i_size_read(realinode)); > > > + spin_unlock(&inode->i_lock); > > > > My concerns about this part of the patch is that AFAIK, > > all the calls of ovl_copyattr(), except for the one in aio completion, > > are called with overlayfs inode mutex lock held. > > > > So this lock is strictly only needed because of the aio write case, > > but I think we need to have the lock in place for all the other > > cases to protect them against racing with aio completion? > > > > I guess the overhead of the spinlock is not a worry if the > > mutex is already held, even though we do call ovl_copyattr() > > twice (before and after) in some operations. > > > > Anyway, I just want to make sure that I did not make any > > mistakes in my analysis of the problem and the fix. > > > > FYI, for now I dropped this patch from overlayfs-next, because > I was warned that ovl_copyattr() could be called from interrupt > context in aio completion. > > I will bring it back after I sort out the ovl aio completion context. > FYI, after applying "ovl: punt write aio completion to workqueue" [1] to overlayfs-next, I restored the half of this patch that adds spinlock protection to ovl_copyattr(). But I dropped the half that adds spinlock protection to ovl_file_accessed(), because I do not want to add a lock to the read path and because ovl_file_accessed() is not anymore unsafe than relatime_need_update() already is w.r.t atomic access to struct timespec64 and w.r.t atomic access to mtime/ctime/atime fields, so I guess nobody really cares so much about being very strict with relatime rules... Thanks, Amir. [1] https://lore.kernel.org/linux-unionfs/20230928064636.487317-1-amir73il@xxxxxxxxx/