Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs

Vivek Goyal <vgoyal@xxxxxxxxxx> · Mon, 14 Dec 2020 18:01:21 -0500

On Mon, Dec 14, 2020 at 10:04:14PM +0000, Sargun Dhillon wrote:
> On Mon, Dec 14, 2020 at 04:38:43PM -0500, Vivek Goyal wrote:
> > On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote:
> > > Peek at the upper layer's errseq_t at mount time for volatile mounts,
> > > and record it in the per-sb info. In sync_fs, check for an error since
> > > the recorded point and set it in the overlayfs superblock if there was
> > > one.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > > ---
> > 
> > While we are solving problem for non-volatile overlay mount, I also
> > started thinking, what about non-volatile overlay syncfs() writeback errors.
> > Looks like these will not be reported to user space at all as of now
> > (because we never update overlay_sb->s_wb_err ever).
> > 
> > A patch like this might fix it. (compile tested only).
> > 
> > overlayfs: Report syncfs() errors to user space
> > 
> > Currently, syncfs(), calls filesystem ->sync_fs() method but ignores the
> > return code. But certain writeback errors can still be reported on 
> > syncfs() by checking errors on super block.
> > 
> > ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err);
> > 
> > For the case of overlayfs, we never set overlayfs super block s_wb_err. That
> > means sync() will never report writeback errors on overlayfs uppon syncfs().
> > 
> > Fix this by updating overlay sb->sb_wb_err upon ->sync_fs() call. And that
> > should mean that user space syncfs() call should see writeback errors.
> > 
> > ovl_fsync() does not need anything special because if there are writeback
> > errors underlying filesystem will report it through vfs_fsync_range() return
> > code and user space will see it.
> > 
> > Signed-off-by: Vivek Goyal <vgoyal@xxxxxxxxxx>
> > ---
> >  fs/overlayfs/ovl_entry.h |    1 +
> >  fs/overlayfs/super.c     |   14 +++++++++++---
> >  2 files changed, 12 insertions(+), 3 deletions(-)
> > 
> > Index: redhat-linux/fs/overlayfs/super.c
> > ===================================================================
> > --- redhat-linux.orig/fs/overlayfs/super.c	2020-12-14 15:33:43.934400880 -0500
> > +++ redhat-linux/fs/overlayfs/super.c	2020-12-14 16:15:07.127400880 -0500
> > @@ -259,7 +259,7 @@ static int ovl_sync_fs(struct super_bloc
> >  {
> >  	struct ovl_fs *ofs = sb->s_fs_info;
> >  	struct super_block *upper_sb;
> > -	int ret;
> > +	int ret, ret2;
> >  
> >  	if (!ovl_upper_mnt(ofs))
> >  		return 0;
> > @@ -283,7 +283,14 @@ static int ovl_sync_fs(struct super_bloc
> >  	ret = sync_filesystem(upper_sb);
> >  	up_read(&upper_sb->s_umount);
> >  
> > -	return ret;
> > +	if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) {
> > +		/* Upper sb has errors since last time */
> > +		spin_lock(&ofs->errseq_lock);
> > +		ret2 = errseq_check_and_advance(&upper_sb->s_wb_err,
> > +						&sb->s_wb_err);
> > +		spin_unlock(&ofs->errseq_lock);
> > +	}
> > +	return ret ? ret : ret2;
> >  }
> >  
> >  /**
> > @@ -1873,6 +1880,7 @@ static int ovl_fill_super(struct super_b
> >  	if (!cred)
> >  		goto out_err;
> >  
> > +	spin_lock_init(&ofs->errseq_lock);
> >  	/* Is there a reason anyone would want not to share whiteouts? */
> >  	ofs->share_whiteout = true;
> >  
> > @@ -1945,7 +1953,7 @@ static int ovl_fill_super(struct super_b
> >  
> >  		sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
> >  		sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
> > -
> > +		sb->s_wb_err = errseq_sample(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err);
> >  	}
> >  	oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers);
> >  	err = PTR_ERR(oe);
> > Index: redhat-linux/fs/overlayfs/ovl_entry.h
> > ===================================================================
> > --- redhat-linux.orig/fs/overlayfs/ovl_entry.h	2020-12-14 15:33:43.934400880 -0500
> > +++ redhat-linux/fs/overlayfs/ovl_entry.h	2020-12-14 15:34:13.509400880 -0500
> > @@ -79,6 +79,7 @@ struct ovl_fs {
> >  	atomic_long_t last_ino;
> >  	/* Whiteout dentry cache */
> >  	struct dentry *whiteout;
> > +	spinlock_t errseq_lock;
> >  };
> >  
> >  static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
> > 
> 
> This was on my list of things to look at. I don't think we can / should use 
> errseq_check_and_advance because it will hide errors from userspace.

Hi Sargun,

I have been struggling to figure out when to use
errseq_check_and_advance() and when to use this error_check() and
errorseq_set() combination.

My rational for using errseq_check_and_advance() is that say there
is an unseen error on upper super block, and if overlayfs calls
syncfs(), then this call should set SEEN flag on upper super
block, isn't it. This is equivalent of an app directly calling
syncfs() on upper.

If we use error_check() and errseq_set() combination, then we
are just reading state of upper superblock but really not impacting
it despite the fact overlay apps are calling syncfs().

Comapre it with fsync(). For non-volatile overlay, an fsync() overlay
call will set SEEN flag on upper file. And I believe same thing
should happen for ovl_syncfs() call as well. It makes more sense to me.

Wondering how will it hide errors from user space. ovl "struct file"
will have its own f->f_sb_err initialized from overlay superblock. And
if overlay super block gets updated with error, a later
errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err) should
still return an error.

What am I missing?

Vivek

> I think we 
> need something like:
> 
> At startup, call errseq_peek and stash that value somewhere. This sets the 
> MUSTINC flag.

So this errseq_peek() stuff is required only if we don't want to 
consume error while checking for error. In fact consuming error
is not bad as long as we do it only ovl_syncfs() path. In fact
it will match current semantics of syncfs().

But issue we have is that we want to check for error outside syncfs()
path too and don't want to consume it (otherwise it breaks the semantics
that it will bee seen marked in syncfs() path).

So that's why this notion of checking error without consuming it
so tha we can check it in remount path.

But in syncfs() path, it should be ok to consume unseen error and
we should be able to call errseq_check_and_advance(), both for
volatile and non-volatile mounts, isn't it?

For ther paths, like remount, we probably can stash away on persistent
storage and compare that value on remount and fail remount without
actually consuming unseen error (because it is not syncfs path). This
possibly can be used in other paths like read/write as well to
make sure we can notice error without consuming it.

IOW, we seem to have to paths we want to check errors in. In ovl_syncfs()
path we should be able consume exisiting unseen error on upper, so
errseq_check_and_advance() makes sense. In rest of the paths, we
should use new semantics to check for errors.

Vivek

> 
> At syncfs time: call errseq check, if it says there is an error, call 
> errseq_peek again, and store the error in our superblock. Take the error value 
> from the differenceb between the previous one and the new one, and copy it up to 
> the superblock.
> 
> Either way, I think Jeff's work of making it so other kernel subsytems can 
> interact with errseq on a superblock bears fruit elsewhere. If the first patch 
> gets merged, I can put together the patches to do the standard error bubble
> up for normal syncfs, volatile syncfs, and volatile remount.
>