On Sat, 2020-12-19 at 07:53 -0500, Jeff Layton wrote: > On Sat, 2020-12-19 at 06:13 +0000, Matthew Wilcox wrote: > > On Thu, Dec 17, 2020 at 10:00:37AM -0500, Jeff Layton wrote: > > > Overlayfs's volatile mounts want to be able to sample an error for their > > > own purposes, without preventing a later opener from potentially seeing > > > the error. > > > > umm ... can't they just copy the errseq_t they're interested in, followed > > by calling errseq_check() later? > > > > They don't want the sampling for the volatile mount to prevent later > openers from seeing an error that hasn't yet been reported. > > If they copy the errseq_t (or just do an errseq_sample), and then follow > it with a errseq_check_and_advance then the SEEN bit will end up being > set and a later opener wouldn't see the error. > > Aside from that though, I think this patch clarifies things a bit since > the SEEN flag currently means two different things: > > 1/ do I need to increment the counter when recording another error? > > 2/ do I need to report this error to new samplers (at open time) > > That was ok before, since we those conditions were always changed > together, but with the overlayfs volatile mount use-case, it no longer > does. > > > actually, isn't errseq_check() buggy in the face of multiple > > watchers? consider this: > > > > worker.es starts at 0 > > t2.es = errseq_sample(&worker.es) > > errseq_set(&worker.es, -EIO) > > t1.es = errseq_sample(&worker.es) > > t2.err = errseq_check_and_advance(&es, t2.es) > > ** this sets ERRSEQ_SEEN ** > > t1.err = errseq_check(&worker.es, t1.es) > > ** reports an error, even though the only change is that > > ERRSEQ_SEEN moved **. > > > > i think errseq_check() should be: > > > > if (likely(cur | ERRSEQ_SEEN) == (since | ERRSEQ_SEEN)) > > return 0; > > > > i'm not yet convinced other changes are needed to errseq. but i am > > having great trouble understanding exactly what overlayfs is trying to do. > > I think you're right on errseq_check. I'll plan to do a patch to fix > that up as well. > I take it back. This isn't a problem for a couple of (non-obvious) reasons: 1/ any "real" sample will either have the SEEN bit set or be 0. In the first case, errseq_check will return true (which is correct). In the second any change from what you sampled will have to have been a recorded error. 2/ ...but even if that weren't the case and you got a false positive from errseq_check, all of the existing callers call errseq_check_and_advance just afterward, which handles this correctly. Either way, splitting the flag in two means that we do need to take the flag bits into account, so errseq_same masks them off before comparing. > I too am having a bit of trouble understanding all of the nuances here. > My current understanding is that it depends on the "volatility" of the > mount: > > normal (non-volatile): they basically want to be able to track errors as > if the files were being opened on the upper layer. For this case I think > they should aim to just do all of the error checking against the upper > sb and ignore the overlayfs s_wb_err field. This does mean pushing the > errseq_check_and_advance down into the individual filesystems in some > fashion though. > > volatile: they want to sample at mount time and always return an error > to syncfs if there has been another error since the original sample > point. This sampling should also not affect later openers on the upper > layer (or on other overlayfs mounts). > > I'm not 100% clear on whether I understand both use-cases correctly > though. -- Jeff Layton <jlayton@xxxxxxxxxx>