On Wed, Jan 06, 2021 at 02:46:58PM -0500, Vivek Goyal wrote: > On Wed, Jan 06, 2021 at 12:35:46AM -0800, Sargun Dhillon wrote: > > Overlayfs's volatile option allows the user to bypass all forced sync calls > > to the upperdir filesystem. This comes at the cost of safety. We can never > > ensure that the user's data is intact, but we can make a best effort to > > expose whether or not the data is likely to be in a bad state. > > > > The best way to handle this in the time being is that if an overlayfs's > > upperdir experiences an error after a volatile mount occurs, that error > > will be returned on fsync, fdatasync, sync, and syncfs. This is > > contradictory to the traditional behaviour of VFS which fails the call > > once, and only raises an error if a subsequent fsync error has occurred, > > and been raised by the filesystem. > > > > One awkward aspect of the patch is that we have to manually set the > > superblock's errseq_t after the sync_fs callback as opposed to just > > returning an error from syncfs. This is because the call chain looks > > something like this: > > > > sys_syncfs -> > > sync_filesystem -> > > __sync_filesystem -> > > /* The return value is ignored here > > sb->s_op->sync_fs(sb) > > _sync_blockdev > > /* Where the VFS fetches the error to raise to userspace */ > > errseq_check_and_advance > > > > Because of this we call errseq_set every time the sync_fs callback occurs. > > Why not start capturing return code of ->sync_fs and then return error > from ovl->sync_fs. And then you don't have to do errseq_set(ovl_sb). > > I already posted a patch to capture retrun code from ->sync_fs. > > https://lore.kernel.org/linux-fsdevel/20201221195055.35295-2-vgoyal@xxxxxxxxxx/ > The idea of this patch is to go into stable, and a minimal patch to prevent overlayfs volatile mounts from expressing unintended behaviour. I think that your changes are still valid, and can sit atop this [and you can remove the errseq_set]. I believe the consensus was that changing the behaviour for all filesystems presented undue risk to have the patch land in stable. > > > Due to the nature of this seen / unseen dichotomy, if the upperdir is an > > inconsistent state at the initial mount time, overlayfs will refuse to > > mount, as overlayfs cannot get a snapshot of the upperdir's errseq that > > will increment on error until the user calls syncfs. > > > > Signed-off-by: Sargun Dhillon <sargun@xxxxxxxxx> > > Suggested-by: Amir Goldstein <amir73il@xxxxxxxxx> > > Cc: linux-fsdevel@xxxxxxxxxxxxxxx > > Cc: linux-unionfs@xxxxxxxxxxxxxxx > > Cc: Jeff Layton <jlayton@xxxxxxxxxx> > > Cc: Miklos Szeredi <miklos@xxxxxxxxxx> > > Cc: Amir Goldstein <amir73il@xxxxxxxxx> > > Cc: Vivek Goyal <vgoyal@xxxxxxxxxx> > > Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > > --- > > Documentation/filesystems/overlayfs.rst | 8 +++++++ > > fs/overlayfs/file.c | 5 ++-- > > fs/overlayfs/overlayfs.h | 1 + > > fs/overlayfs/ovl_entry.h | 2 ++ > > fs/overlayfs/readdir.c | 5 ++-- > > fs/overlayfs/super.c | 32 +++++++++++++++++++------ > > fs/overlayfs/util.c | 27 +++++++++++++++++++++ > > 7 files changed, 69 insertions(+), 11 deletions(-) > > > > diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst > > index 580ab9a0fe31..3af569cea6a7 100644 > > --- a/Documentation/filesystems/overlayfs.rst > > +++ b/Documentation/filesystems/overlayfs.rst > > @@ -575,6 +575,14 @@ without significant effort. > > The advantage of mounting with the "volatile" option is that all forms of > > sync calls to the upper filesystem are omitted. > > > > +In order to avoid a giving a false sense of safety, the syncfs (and fsync) > > +semantics of volatile mounts are slightly different than that of the rest of > > +VFS. If any error occurs on the upperdir's filesystem after a volatile mount > ^^^ > shoud we say "If any writeback error occurs...." > Sure. > > +takes place, all sync functions will return the last error observed on the > > +upperdir filesystem. Once this condition is reached, the filesystem will not > > +recover, and every subsequent sync call will return an error, even if the > > +upperdir has not experience a new error since the last sync call. > > Once filesystem fails, do we want to continue to return latest error on > upper? Or we just mark filesystem failed internally and once failed > we always return a fixed error, say -EIO. That way we don't have to > call errseq_check() on every filesystem call. I am assuming at some > point of time we will extend this to other filesystem functions > like read()/write()/mmap() etc. Filesystem has failed at this point > of time and user is supposed to throw away upper and restart. > I think we talked about this on another thread -- adding filesystem shutdown[1]. I think that once we land this, we can go a number of ways in -next and add shutdown, direct error return, and volatile remount, but I'd rather get something into stable which is minimal earlier than later. > > + > > When overlay is mounted with "volatile" option, the directory > > "$workdir/work/incompat/volatile" is created. During next mount, overlay > > checks for this directory and refuses to mount if present. This is a strong > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c > > index a1f72ac053e5..5c5c3972ebd0 100644 > > --- a/fs/overlayfs/file.c > > +++ b/fs/overlayfs/file.c > > @@ -445,8 +445,9 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync) > > const struct cred *old_cred; > > int ret; > > > > - if (!ovl_should_sync(OVL_FS(file_inode(file)->i_sb))) > > - return 0; > > + ret = ovl_sync_status(OVL_FS(file_inode(file)->i_sb)); > > + if (ret <= 0) > > + return ret; > > > > ret = ovl_real_fdget_meta(file, &real, !datasync); > > if (ret) > > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h > > index f8880aa2ba0e..9f7af98ae200 100644 > > --- a/fs/overlayfs/overlayfs.h > > +++ b/fs/overlayfs/overlayfs.h > > @@ -322,6 +322,7 @@ int ovl_check_metacopy_xattr(struct ovl_fs *ofs, struct dentry *dentry); > > bool ovl_is_metacopy_dentry(struct dentry *dentry); > > char *ovl_get_redirect_xattr(struct ovl_fs *ofs, struct dentry *dentry, > > int padding); > > +int ovl_sync_status(struct ovl_fs *ofs); > > > > static inline bool ovl_is_impuredir(struct super_block *sb, > > struct dentry *dentry) > > diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h > > index 1b5a2094df8e..b208eba5d0b6 100644 > > --- a/fs/overlayfs/ovl_entry.h > > +++ b/fs/overlayfs/ovl_entry.h > > @@ -79,6 +79,8 @@ struct ovl_fs { > > atomic_long_t last_ino; > > /* Whiteout dentry cache */ > > struct dentry *whiteout; > > + /* r/o snapshot of upperdir sb's only taken on volatile mounts */ > > + errseq_t errseq; > > }; > > > > static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs) > > diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c > > index 01620ebae1bd..a273ef901e57 100644 > > --- a/fs/overlayfs/readdir.c > > +++ b/fs/overlayfs/readdir.c > > @@ -909,8 +909,9 @@ static int ovl_dir_fsync(struct file *file, loff_t start, loff_t end, > > struct file *realfile; > > int err; > > > > - if (!ovl_should_sync(OVL_FS(file->f_path.dentry->d_sb))) > > - return 0; > > + err = ovl_sync_status(OVL_FS(file->f_path.dentry->d_sb)); > > + if (err <= 0) > > + return err; > > > > realfile = ovl_dir_real_file(file, true); > > err = PTR_ERR_OR_ZERO(realfile); > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c > > index 290983bcfbb3..b917b456bbb4 100644 > > --- a/fs/overlayfs/super.c > > +++ b/fs/overlayfs/super.c > > @@ -261,11 +261,18 @@ static int ovl_sync_fs(struct super_block *sb, int wait) > > struct super_block *upper_sb; > > int ret; > > > > - if (!ovl_upper_mnt(ofs)) > > - return 0; > > + ret = ovl_sync_status(ofs); > > + /* > > + * We have to always set the err, because the return value isn't > > + * checked in syncfs, and instead indirectly return an error via > > + * the sb's writeback errseq, which VFS inspects after this call. > > + */ > > + if (ret < 0) > > + errseq_set(&sb->s_wb_err, ret); > > Again, I think we can simplify this. If we just capture return code of > ->sync_fs in VFS and return to user space, we can simply return an > error instead of trying to play this game of setting errseq on overlay > superblock. > > Thanks > Vivek > If you want to land that in stable, I'm fine with returning an error directly, but I'll leave that up to Al and Matthew. [1]: https://lore.kernel.org/linux-unionfs/CAOQ4uxhra_RB98gJ7ovGhbUV1atCR1rMPnf63tT37WtrNC0asg@xxxxxxxxxxxxxx/T/#u