On Mon, Dec 28, 2020 at 7:26 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > On Mon, 2020-12-28 at 15:56 +0000, Matthew Wilcox wrote: > > On Mon, Dec 28, 2020 at 08:25:50AM -0500, Jeff Layton wrote: > > > To be clear, the main thing you'll lose with the method above is the > > > ability to see an unseen error on a newly opened fd, if there was an > > > overlayfs mount using the same upper sb before your open occurred. > > > > > > IOW, consider two overlayfs mounts using the same upper layer sb: > > > > > > ovlfs1 ovlfs2 > > > ---------------------------------------------------------------------- > > > mount > > > open fd1 > > > write to fd1 > > > <writeback fails> > > > mount (upper errseq_t SEEN flag marked) > > > open fd2 > > > syncfs(fd2) > > > syncfs(fd1) > > > > > > > > > On a "normal" (non-overlay) fs, you'd get an error back on both syncfs > > > calls. The first one has a sample from before the error occurred, and > > > the second one has a sample of 0, due to the fact that the error was > > > unseen at open time. > > > > > > On overlayfs, with the intervening mount of ovlfs2, syncfs(fd1) will > > > return an error and syncfs(fd2) will not. If we split the SEEN flag into > > > two, then we can ensure that they both still get an error in this > > > situation. > > > > But do we need to? If the inode has been evicted we also lose the errno. > > The guarantee we provide is that a fd that was open before the error > > occurred will see the error. An fd that's opened after the error occurred > > may or may not see the error. > > > > In principle, you can lose errors this way (which was the justification > for making errseq_sample return 0 when there are unseen errors). E.g., > if you close fd1 instead of doing a syncfs on it, that error will be > lost forever. > > As to whether that's OK, it's hard to say. It is a deviation from how > this works in a non-containerized situation, and I'd argue that it's > less than ideal. You may or may not see the error on fd2, but it's > dependent on events that take place outside the container and that > aren't observable from within it. That effectively makes the results > non-deterministic, which is usually a bad thing in computing... > I understand that user experience inside containers will deviate from non containerized use cases. I can't say that I fully understand the situations that deviate. Having said that, I never objected to the SEEN flag split. To me, the split looks architecturally correct. If not for anything else, then for not observing past errors inside the overlay mount. I think you still need to convince Matthew though. The question remains what, if anything, should be nominated for stable. I was trying to propose the minimal patch that fixes the most basic syncfs overlayfs issues. In that context, it seemed that the issues that SEEN flag split solves are not on the MUST HAVE list, but maybe I am wrong. Sargun, How about sending another version of your patch, with or without the SEEN flag split (up to you) but not only for both the volatile and non- volatile cases, following my proposal. At least we can continue debating on a concrete patch instead of an envisioned combination of pieces posted to the list. If you can give some examples of use cases that the patch fixes with and without the SEEN flag split that could be useful for the discussion. Thanks, Amir.