Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs

Sargun Dhillon <sargun@xxxxxxxxx> · Mon, 14 Dec 2020 22:04:14 +0000

On Mon, Dec 14, 2020 at 04:38:43PM -0500, Vivek Goyal wrote:
> On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote:
> > Peek at the upper layer's errseq_t at mount time for volatile mounts,
> > and record it in the per-sb info. In sync_fs, check for an error since
> > the recorded point and set it in the overlayfs superblock if there was
> > one.
> > 
> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > ---
> 
> While we are solving problem for non-volatile overlay mount, I also
> started thinking, what about non-volatile overlay syncfs() writeback errors.
> Looks like these will not be reported to user space at all as of now
> (because we never update overlay_sb->s_wb_err ever).
> 
> A patch like this might fix it. (compile tested only).
> 
> overlayfs: Report syncfs() errors to user space
> 
> Currently, syncfs(), calls filesystem ->sync_fs() method but ignores the
> return code. But certain writeback errors can still be reported on 
> syncfs() by checking errors on super block.
> 
> ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err);
> 
> For the case of overlayfs, we never set overlayfs super block s_wb_err. That
> means sync() will never report writeback errors on overlayfs uppon syncfs().
> 
> Fix this by updating overlay sb->sb_wb_err upon ->sync_fs() call. And that
> should mean that user space syncfs() call should see writeback errors.
> 
> ovl_fsync() does not need anything special because if there are writeback
> errors underlying filesystem will report it through vfs_fsync_range() return
> code and user space will see it.
> 
> Signed-off-by: Vivek Goyal <vgoyal@xxxxxxxxxx>
> ---
>  fs/overlayfs/ovl_entry.h |    1 +
>  fs/overlayfs/super.c     |   14 +++++++++++---
>  2 files changed, 12 insertions(+), 3 deletions(-)
> 
> Index: redhat-linux/fs/overlayfs/super.c
> ===================================================================
> --- redhat-linux.orig/fs/overlayfs/super.c	2020-12-14 15:33:43.934400880 -0500
> +++ redhat-linux/fs/overlayfs/super.c	2020-12-14 16:15:07.127400880 -0500
> @@ -259,7 +259,7 @@ static int ovl_sync_fs(struct super_bloc
>  {
>  	struct ovl_fs *ofs = sb->s_fs_info;
>  	struct super_block *upper_sb;
> -	int ret;
> +	int ret, ret2;
>  
>  	if (!ovl_upper_mnt(ofs))
>  		return 0;
> @@ -283,7 +283,14 @@ static int ovl_sync_fs(struct super_bloc
>  	ret = sync_filesystem(upper_sb);
>  	up_read(&upper_sb->s_umount);
>  
> -	return ret;
> +	if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) {
> +		/* Upper sb has errors since last time */
> +		spin_lock(&ofs->errseq_lock);
> +		ret2 = errseq_check_and_advance(&upper_sb->s_wb_err,
> +						&sb->s_wb_err);
> +		spin_unlock(&ofs->errseq_lock);
> +	}
> +	return ret ? ret : ret2;
>  }
>  
>  /**
> @@ -1873,6 +1880,7 @@ static int ovl_fill_super(struct super_b
>  	if (!cred)
>  		goto out_err;
>  
> +	spin_lock_init(&ofs->errseq_lock);
>  	/* Is there a reason anyone would want not to share whiteouts? */
>  	ofs->share_whiteout = true;
>  
> @@ -1945,7 +1953,7 @@ static int ovl_fill_super(struct super_b
>  
>  		sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
>  		sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
> -
> +		sb->s_wb_err = errseq_sample(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err);
>  	}
>  	oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers);
>  	err = PTR_ERR(oe);
> Index: redhat-linux/fs/overlayfs/ovl_entry.h
> ===================================================================
> --- redhat-linux.orig/fs/overlayfs/ovl_entry.h	2020-12-14 15:33:43.934400880 -0500
> +++ redhat-linux/fs/overlayfs/ovl_entry.h	2020-12-14 15:34:13.509400880 -0500
> @@ -79,6 +79,7 @@ struct ovl_fs {
>  	atomic_long_t last_ino;
>  	/* Whiteout dentry cache */
>  	struct dentry *whiteout;
> +	spinlock_t errseq_lock;
>  };
>  
>  static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
> 

This was on my list of things to look at. I don't think we can / should use 
errseq_check_and_advance because it will hide errors from userspace. I think we 
need something like:

At startup, call errseq_peek and stash that value somewhere. This sets the 
MUSTINC flag.

At syncfs time: call errseq check, if it says there is an error, call 
errseq_peek again, and store the error in our superblock. Take the error value 
from the differenceb between the previous one and the new one, and copy it up to 
the superblock.

Either way, I think Jeff's work of making it so other kernel subsytems can 
interact with errseq on a superblock bears fruit elsewhere. If the first patch 
gets merged, I can put together the patches to do the standard error bubble
up for normal syncfs, volatile syncfs, and volatile remount.