Re: [PATCH] overlay: Implement volatile-specific fsync error behaviour

Sargun Dhillon <sargun@xxxxxxxxx> · Wed, 2 Dec 2020 19:03:21 +0000



On Wed, Dec 02, 2020 at 01:56:01PM -0500, Vivek Goyal wrote:
> On Wed, Dec 02, 2020 at 01:22:09PM -0500, Jeff Layton wrote:
> > On Wed, 2020-12-02 at 12:29 -0500, Vivek Goyal wrote:
> > > On Wed, Dec 02, 2020 at 12:02:43PM -0500, Jeff Layton wrote:
> > > 
> > > [..]
> > > > > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > > > > > index 290983bcfbb3..82a096a05bce 100644
> > > > > > --- a/fs/overlayfs/super.c
> > > > > > +++ b/fs/overlayfs/super.c
> > > > > > @@ -261,11 +261,18 @@ static int ovl_sync_fs(struct super_block *sb, int wait)
> > > > > > Â 	struct super_block *upper_sb;
> > > > > > Â 	int ret;
> > > > > > Â 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > -	if (!ovl_upper_mnt(ofs))
> > > > > > -		return 0;
> > > > > > +	ret = ovl_check_sync(ofs);
> > > > > > +	/*
> > > > > > +	 * We have to always set the err, because the return value isn't
> > > > > > +	 * checked, and instead VFS looks at the writeback errseq after
> > > > > > +	 * this call.
> > > > > > +	 */
> > > > > > +	if (ret < 0)
> > > > > > +		errseq_set(&sb->s_wb_err, ret);
> > > > > 
> > > > > I was wondering that why errseq_set() will result in returning error
> > > > > all the time. Then realized that last syncfs() call must have set
> > > > > ERRSEQ_SEEN flag and that will mean errseq_set() will increment
> > > > > counter and that means this syncfs() will will return error too. Cool.
> > > > > 
> > > > > > +
> > > > > > +	if (!ret)
> > > > > > +		return ret;
> > > > > > Â 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > -	if (!ovl_should_sync(ofs))
> > > > > > -		return 0;
> > > > > > Â 	/*
> > > > > > Â 	 * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
> > > > > > Â 	 * All the super blocks will be iterated, including upper_sb.
> > > > > > @@ -1927,6 +1934,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> > > > > > Â 	sb->s_op = &ovl_super_operations;
> > > > > > Â 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Â 	if (ofs->config.upperdir) {
> > > > > > +		struct super_block *upper_mnt_sb;
> > > > > > +
> > > > > > Â 		if (!ofs->config.workdir) {
> > > > > > Â 			pr_err("missing 'workdir'\n");
> > > > > > Â 			goto out_err;
> > > > > > @@ -1943,9 +1952,10 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> > > > > > Â 		if (!ofs->workdir)
> > > > > > Â 			sb->s_flags |= SB_RDONLY;
> > > > > > Â 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > -		sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
> > > > > > -		sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
> > > > > > -
> > > > > > +		upper_mnt_sb = ovl_upper_mnt(ofs)->mnt_sb;
> > > > > > +		sb->s_stack_depth = upper_mnt_sb->s_stack_depth;
> > > > > > +		sb->s_time_gran = upper_mnt_sb->s_time_gran;
> > > > > > +		ofs->upper_errseq = errseq_sample(&upper_mnt_sb->s_wb_err);
> > > > > 
> > > > > I asked this question in last email as well. errseq_sample() will return
> > > > > 0 if current error has not been seen yet. That means next time a sync
> > > > > call comes for volatile mount, it will return an error. But that's
> > > > > not what we want. When we mounted a volatile overlay, if there is an
> > > > > existing error (seen/unseen), we don't care. We only care if there
> > > > > is a new error after the volatile mount, right?
> > > > > 
> > > > > I guess we will need another helper similar to errseq_smaple() which
> > > > > just returns existing value of errseq. And then we will have to
> > > > > do something about errseq_check() to not return an error if "since"
> > > > > and "eseq" differ only by "seen" bit.
> > > > > 
> > > > > Otherwise in current form, volatile mount will always return error
> > > > > if upperdir has error and it has not been seen by anybody.
> > > > > 
> > > > > How did you finally end up testing the error case. Want to simualate
> > > > > error aritificially and test it.
> > > > > 
> > > > 
> > > > If you don't want to see errors that occurred before you did the mount,
> > > > then you probably can just resurrect and rename the original version of
> > > > errseq_sample. Something like this, but with a different name:
> > > > 
> > > > +errseq_t errseq_sample(errseq_t *eseq)
> > > > +{
> > > > +       errseq_t old = READ_ONCE(*eseq);
> > > > +       errseq_t new = old;
> > > > +
> > > > +       /*
> > > > +        * For the common case of no errors ever having been set, we can skip
> > > > +        * marking the SEEN bit. Once an error has been set, the value will
> > > > +        * never go back to zero.
> > > > +        */
> > > > +       if (old != 0) {
> > > > +               new |= ERRSEQ_SEEN;
> > > > +               if (old != new)
> > > > +                       cmpxchg(eseq, old, new);
> > > > +       }
> > > > +       return new;
> > > > +}
> > > 
> > > Yes, a helper like this should solve the issue at hand. We are not
> > > interested in previous errors. This also sets the ERRSEQ_SEEN on 
> > > sample and it will also solve the other issue when after sampling
> > > if error gets seen, we don't want errseq_check() to return error.
> > > 
> > > Thinking of some possible names for new function.
> > > 
> > > errseq_sample_seen()
> > > errseq_sample_set_seen()
> > > errseq_sample_consume_unseen()
> > > errseq_sample_current()
> > > 
> > 
> > errseq_sample_consume_unseen() sounds good, though maybe it should be
> > "ignore_unseen"? IDK, naming this stuff is the hardest part.
> > 
> > If you don't want to add a new helper, I think you'd probably also be
> > able to do something like this in fill_super:
> > 
> >     errseq_sample()
> >     errseq_check_and_advance()
> > 
> > 
> > ...and just ignore the error returned by the check and advance. At that
> > point, the cursor should be caught up and any subsequent syncfs call
> > should return 0 until you record another error. It's a little less
> > efficient, but only slightly so.
> 
> This seems even better.
> 
> Thinking little bit more. I am now concerned about setting ERRSEQ_SEEN on
> sample. In our case, that would mean that we consumed an unseen error but
> never reported it back to user space. And then somebody might complain.
> 
> This kind of reminds me posgresql's fsync issues where they did
> writes using one fd and another thread opened another fd and
> did sync and they expected any errors to be reported.
> 
> Similary what if an unseen error is present on superblock on upper
> and if we mount volatile overlay and mark the error SEEN, then
> if another process opens a file on upper and did syncfs(), it will
> complain that exisiting error was not reported to it.
> 
> Overlay use case seems to be that we just want to check if an error
> has happened on upper superblock since we sampled it and don't
> want to consume that error as such. Will it make sense to introduce
> two helpers for error sampling and error checking which mask the
> SEEN bit and don't do anything with it. For example, following compile
> tested only patch.
> 
> Now we will not touch SEEN bit at all. And even if SEEN gets set
> since we sampled, errseq_check_mask_seen() will not flag it as
> error.
> 
> Thanks
> Vivek
> 
> ---
>  lib/errseq.c |   17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> Index: redhat-linux/lib/errseq.c
> ===================================================================
> --- redhat-linux.orig/lib/errseq.c	2020-06-09 08:59:29.712836019 -0400
> +++ redhat-linux/lib/errseq.c	2020-12-02 13:40:08.085775647 -0500
> @@ -130,6 +130,12 @@ errseq_t errseq_sample(errseq_t *eseq)
>  }
>  EXPORT_SYMBOL(errseq_sample);
>  
> +errseq_t errseq_sample_mask_seen(errseq_t *eseq)
> +{
> +	return READ_ONCE(*eseq) & (~ERRSEQ_SEEN);
> +}
> +EXPORT_SYMBOL(errseq_sample_mask_seen);
> +
If below, we're doing since &= ~ERRSEQ_SEEN;, I see no reason
to remove it here, and just not use READ_ONCE directly.

>  /**
>   * errseq_check() - Has an error occurred since a particular sample point?
>   * @eseq: Pointer to errseq_t value to be checked.
> @@ -151,6 +157,17 @@ int errseq_check(errseq_t *eseq, errseq_
>  }
>  EXPORT_SYMBOL(errseq_check);
>  
> +int errseq_check_mask_seen(errseq_t *eseq, errseq_t since)
> +{
> +	errseq_t cur = READ_ONCE(*eseq) & (~ERRSEQ_SEEN);
> +
> +	since &= ~ERRSEQ_SEEN;
> +	if (likely(cur == since))
> +		return 0;
> +	return -(cur & MAX_ERRNO);
> +}
> +EXPORT_SYMBOL(errseq_check_mask_seen);
> +
This ignores the wrapping case, where cur has SEEN not set on it,
but since does.

>  /**
>   * errseq_check_and_advance() - Check an errseq_t and advance to current value.
>   * @eseq: Pointer to value being checked and reported.
>