On Wed, Dec 02, 2020 at 12:29:06PM -0500, Vivek Goyal wrote: > On Wed, Dec 02, 2020 at 12:02:43PM -0500, Jeff Layton wrote: > > [..] > > > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c > > > > index 290983bcfbb3..82a096a05bce 100644 > > > > --- a/fs/overlayfs/super.c > > > > +++ b/fs/overlayfs/super.c > > > > @@ -261,11 +261,18 @@ static int ovl_sync_fs(struct super_block *sb, int wait) > > > > struct super_block *upper_sb; > > > > int ret; > > > > > > > > > > > > > > > > > > > > - if (!ovl_upper_mnt(ofs)) > > > > - return 0; > > > > + ret = ovl_check_sync(ofs); > > > > + /* > > > > + * We have to always set the err, because the return value isn't > > > > + * checked, and instead VFS looks at the writeback errseq after > > > > + * this call. > > > > + */ > > > > + if (ret < 0) > > > > + errseq_set(&sb->s_wb_err, ret); > > > > > > I was wondering that why errseq_set() will result in returning error > > > all the time. Then realized that last syncfs() call must have set > > > ERRSEQ_SEEN flag and that will mean errseq_set() will increment > > > counter and that means this syncfs() will will return error too. Cool. > > > > > > > + > > > > + if (!ret) > > > > + return ret; > > > > > > > > > > > > > > > > > > > > - if (!ovl_should_sync(ofs)) > > > > - return 0; > > > > /* > > > > * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC). > > > > * All the super blocks will be iterated, including upper_sb. > > > > @@ -1927,6 +1934,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent) > > > > sb->s_op = &ovl_super_operations; > > > > > > > > > > > > > > > > > > > > if (ofs->config.upperdir) { > > > > + struct super_block *upper_mnt_sb; > > > > + > > > > if (!ofs->config.workdir) { > > > > pr_err("missing 'workdir'\n"); > > > > goto out_err; > > > > @@ -1943,9 +1952,10 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent) > > > > if (!ofs->workdir) > > > > sb->s_flags |= SB_RDONLY; > > > > > > > > > > > > > > > > > > > > - sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth; > > > > - sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran; > > > > - > > > > + upper_mnt_sb = ovl_upper_mnt(ofs)->mnt_sb; > > > > + sb->s_stack_depth = upper_mnt_sb->s_stack_depth; > > > > + sb->s_time_gran = upper_mnt_sb->s_time_gran; > > > > + ofs->upper_errseq = errseq_sample(&upper_mnt_sb->s_wb_err); > > > > > > I asked this question in last email as well. errseq_sample() will return > > > 0 if current error has not been seen yet. That means next time a sync > > > call comes for volatile mount, it will return an error. But that's > > > not what we want. When we mounted a volatile overlay, if there is an > > > existing error (seen/unseen), we don't care. We only care if there > > > is a new error after the volatile mount, right? > > > > > > I guess we will need another helper similar to errseq_smaple() which > > > just returns existing value of errseq. And then we will have to > > > do something about errseq_check() to not return an error if "since" > > > and "eseq" differ only by "seen" bit. > > > > > > Otherwise in current form, volatile mount will always return error > > > if upperdir has error and it has not been seen by anybody. > > > > > > How did you finally end up testing the error case. Want to simualate > > > error aritificially and test it. > > > I used the blockdev error injection layer. It only works with ext2, because ext4 (and other filesystems) will error and go into readonly. dd if=/dev/zero of=/tmp/loop bs=1M count=100 losetup /dev/loop8 /tmp/loop mkfs.ext2 /dev/loop8 mount -o errors=continue /dev/loop8 /mnt/loop/ mkdir -p /mnt/loop/{upperdir,workdir} mount -t overlay -o volatile,index=off,lowerdir=/root/lowerdir,upperdir=/mnt/loop/upperdir,workdir=/mnt/loop/workdir none /mnt/foo/ echo 1 > /sys/block/loop8/make-it-fail echo 100 > /sys/kernel/debug/fail_make_request/probability echo 1 > /sys/kernel/debug/fail_make_request/times dd if=/dev/zero of=/mnt/foo/zero bs=1M count=1 sync I tried to get XFS tests working, but I was unable to get a simpler repro than above. This is also easy enough to do with a simple kernel module. Maybe it'd be neat to be able to inject in errseq increments via the fault injection API one day? I have no idea what the VFS's approach here is. > > > > If you don't want to see errors that occurred before you did the mount, > > then you probably can just resurrect and rename the original version of > > errseq_sample. Something like this, but with a different name: > > > > +errseq_t errseq_sample(errseq_t *eseq) > > +{ > > + errseq_t old = READ_ONCE(*eseq); > > + errseq_t new = old; > > + > > + /* > > + * For the common case of no errors ever having been set, we can skip > > + * marking the SEEN bit. Once an error has been set, the value will > > + * never go back to zero. > > + */ > > + if (old != 0) { > > + new |= ERRSEQ_SEEN; > > + if (old != new) > > + cmpxchg(eseq, old, new); > > + } > > + return new; > > +} > > Yes, a helper like this should solve the issue at hand. We are not > interested in previous errors. This also sets the ERRSEQ_SEEN on > sample and it will also solve the other issue when after sampling > if error gets seen, we don't want errseq_check() to return error. > > Thinking of some possible names for new function. > > errseq_sample_seen() > errseq_sample_set_seen() > errseq_sample_consume_unseen() > errseq_sample_current() > > Thanks > Vivek > I think we can just replace the code in super.c with: ofs->upper_errseq = READ_ONCE(&upper_mnt_sb->s_wb_err); And then add an errseq helper which checks: int errseq_check_ignore_seen(errseq_t *eseq, errseq_t since) { errseq_t cur = READ_ONCE(*eseq); if ((cur == since) || (cur == since | ERRSEQ_SEEN)) return 0; return -(cur & MAX_ERRNO); } --- This extra (cur == since | ERRSEQ_SEEN) ignores the situation where cur has "been seen". We do not want to do the cmpxchg I think because that would hide the situation from the user where if they do a syncfs we hide the error from the user. If the since had seen already set, but cur does not have seen set, it means we've wrapped.