On Mon, Dec 14, 2020 at 10:35:56AM +1100, NeilBrown wrote: > On Sun, Dec 13 2020, Jeff Layton wrote: > > > Overlayfs's volatile mounts want to be able to sample an error for > > their own purposes, without preventing a later opener from potentially > > seeing the error. > > > > The original reason for the SEEN flag was to make it so that we didn't > > need to increment the counter if nothing had observed the latest value > > and the error was the same. Eventually, a regression was reported in > > the errseq_t conversion, and we fixed that by using the SEEN flag to > > also mean that the error had been reported to userland at least once > > somewhere. > > > > Those are two different states, however. If we instead take a second > > flag bit from the counter, we can track these two things separately, > > and accomodate the overlayfs volatile mount use-case. > > > > Add a new MUSTINC flag that indicates that the counter must be > > incremented the next time an error is set, and rework the errseq > > functions to set and clear that flag whenever the SEEN bit is set or > > cleared. > > > > Test only for the MUSTINC bit when deciding whether to increment the > > counter and only for the SEEN bit when deciding what to return in > > errseq_sample. > > > > Add a new errseq_peek function to allow for the overlayfs use-case. > > This just grabs the latest counter and sets the MUSTINC bit, leaving > > the SEEN bit untouched. > > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > --- > > include/linux/errseq.h | 2 ++ > > lib/errseq.c | 64 ++++++++++++++++++++++++++++++++++-------- > > 2 files changed, 55 insertions(+), 11 deletions(-) > > > > diff --git a/include/linux/errseq.h b/include/linux/errseq.h > > index fc2777770768..6d4b9bc629ac 100644 > > --- a/include/linux/errseq.h > > +++ b/include/linux/errseq.h > > @@ -9,6 +9,8 @@ typedef u32 errseq_t; > > > > errseq_t errseq_set(errseq_t *eseq, int err); > > errseq_t errseq_sample(errseq_t *eseq); > > +errseq_t errseq_peek(errseq_t *eseq); > > +errseq_t errseq_sample_advance(errseq_t *eseq); > > int errseq_check(errseq_t *eseq, errseq_t since); > > int errseq_check_and_advance(errseq_t *eseq, errseq_t *since); > > #endif > > diff --git a/lib/errseq.c b/lib/errseq.c > > index 81f9e33aa7e7..5cc830f0361b 100644 > > --- a/lib/errseq.c > > +++ b/lib/errseq.c > > @@ -38,8 +38,11 @@ > > /* This bit is used as a flag to indicate whether the value has been seen */ > > #define ERRSEQ_SEEN (1 << ERRSEQ_SHIFT) > > Would this look nicer using the BIT() macro? > > #define ERRSEQ_SEEN BIT(ERRSEQ_SHIFT) > > > > > +/* This bit indicates that value must be incremented even when error is same */ > > +#define ERRSEQ_MUSTINC (1 << (ERRSEQ_SHIFT + 1)) > > #define ERRSEQ_MUSTINC BIT(ERRSEQ_SHIFT+1) > > or if you don't like the BIT macro (not everyone does), then maybe > > #define ERR_SEQ_MUSTINC (ERRSEQ_SEEN << 1 ) > > ?? > > > + > > /* The lowest bit of the counter */ > > -#define ERRSEQ_CTR_INC (1 << (ERRSEQ_SHIFT + 1)) > > +#define ERRSEQ_CTR_INC (1 << (ERRSEQ_SHIFT + 2)) > > Ditto. > Yes, I can make that change. The BIT macro is much easier to read. > > > > /** > > * errseq_set - set a errseq_t for later reporting > > @@ -77,11 +80,11 @@ errseq_t errseq_set(errseq_t *eseq, int err) > > for (;;) { > > errseq_t new; > > > > - /* Clear out error bits and set new error */ > > - new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN)) | -err; > > + /* Clear out flag bits and set new error */ > > + new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN|ERRSEQ_MUSTINC)) | -err; > > This is starting to look clumsy (or maybe, this already looked clumsy, > but now that is hard to ignore). > > new = (old & (ERRSEQ_CTR_INC - 1)) | -err > I think you mean: new = (old & ~(ERRSEQ_CTR_INC - 1)) | -err; Maybe I can add a new ERRSEQ_CTR_MASK value though which makes it more evident. > Also this assumes MAX_ERRNO is a mask, which it is .. today. > > BUILD_BUG_ON(MAX_ERRNO & (MAX_ERRNO + 1)); > ?? > We already have this in errseq_set: BUILD_BUG_ON_NOT_POWER_OF_2(MAX_ERRNO + 1); > > > > - /* Only increment if someone has looked at it */ > > - if (old & ERRSEQ_SEEN) > > + /* Only increment if we have to */ > > + if (old & ERRSEQ_MUSTINC) > > new += ERRSEQ_CTR_INC; > > > > /* If there would be no change, then call it done */ > > @@ -122,14 +125,50 @@ EXPORT_SYMBOL(errseq_set); > > errseq_t errseq_sample(errseq_t *eseq) > > { > > errseq_t old = READ_ONCE(*eseq); > > + errseq_t new = old; > > > > - /* If nobody has seen this error yet, then we can be the first. */ > > - if (!(old & ERRSEQ_SEEN)) > > - old = 0; > > - return old; > > + /* > > + * For the common case of no errors ever having been set, we can skip > > + * marking the SEEN|MUSTINC bits. Once an error has been set, the value > > + * will never go back to zero. > > + */ > > + if (old != 0) { > > + new |= ERRSEQ_SEEN|ERRSEQ_MUSTINC; > > You lose me here. Why is ERRSEQ_SEEN being set, where it wasn't before? > > The ERRSEQ_SEEN flag not means precisely "The error has been reported to > userspace". > This operations isn't used to report errors - that is errseq_check(). > > I'm not saying the code it wrong - I really cannot tell. > I'm just saying that I cannot see why it might be right. > I think you're right. We should not be setting SEEN here, but we do need to set MUSTINC if it's not already set. I'll fix (and re-test). Thanks for the review! > > > > > + if (old != new) > > + cmpxchg(eseq, old, new); > > + if (!(old & ERRSEQ_SEEN)) > > + return 0; > > + } > > + return new; > > } > > EXPORT_SYMBOL(errseq_sample); > > > > +/** > > + * errseq_peek - Grab current errseq_t value, but don't mark it SEEN > > + * @eseq: Pointer to errseq_t to be sampled. > > + * > > + * In some cases, we need to be able to sample the errseq_t, but we're not > > + * in a situation where we can report the value to userland. Use this > > + * function to do that. This ensures that later errors will be recorded, > > + * and that any current errors are reported at least once. > > + * > > + * Context: Any context. > > + * Return: The current errseq value. > > + */ > > +errseq_t errseq_peek(errseq_t *eseq) > > +{ > > + errseq_t old = READ_ONCE(*eseq); > > + errseq_t new = old; > > + > > + if (old != 0) { > > + new |= ERRSEQ_MUSTINC; > > + if (old != new) > > + cmpxchg(eseq, old, new); > > + } > > + return new; > > +} > > +EXPORT_SYMBOL(errseq_peek); > > + > > /** > > * errseq_check() - Has an error occurred since a particular sample point? > > * @eseq: Pointer to errseq_t value to be checked. > > @@ -143,7 +182,10 @@ EXPORT_SYMBOL(errseq_sample); > > */ > > int errseq_check(errseq_t *eseq, errseq_t since) > > { > > - errseq_t cur = READ_ONCE(*eseq); > > + errseq_t cur = READ_ONCE(*eseq) & ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN); > > + > > + /* Clear the flag bits for comparison */ > > + since &= ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN); > > > > if (likely(cur == since)) > > return 0; > > @@ -195,7 +237,7 @@ int errseq_check_and_advance(errseq_t *eseq, errseq_t *since) > > * can advance "since" and return an error based on what we > > * have. > > */ > > - new = old | ERRSEQ_SEEN; > > + new = old | ERRSEQ_SEEN | ERRSEQ_MUSTINC; > > if (new != old) > > cmpxchg(eseq, old, new); > > *since = new; > > -- > > 2.29.2