On Fri, 2020-11-27 at 22:11 +0000, Sargun Dhillon wrote: > On Fri, Nov 27, 2020 at 02:52:52PM +0200, Amir Goldstein wrote: > > On Fri, Nov 27, 2020 at 11:21 AM Sargun Dhillon <sargun@xxxxxxxxx> wrote: > > > > > > This documents behaviour that was discussed in a thread about the volatile > > > feature. Specifically, how failures can go hidden from asynchronous writes > > > (such as from mmap, or writes that are not immediately flushed to the > > > filesystem). Although we pass through calls like msync, fallocate, and > > > write, and will still return errors on those, it doesn't guarantee all > > > kinds of errors will happen at those times, and thus may hide errors. > > > > > > In the future, we can add error checking to all interactions with the > > > upperdir, and pass through errseq_t from the upperdir on mappings, > > > and other interactions with the filesystem[1]. > > > > > > [1]: https://lore.kernel.org/linux-unionfs/20201116045758.21774-1-sargun@xxxxxxxxx/T/#m7d501f375e031056efad626e471a1392dd3aad33 > > > > > > Signed-off-by: Sargun Dhillon <sargun@xxxxxxxxx> > > > Cc: linux-fsdevel@xxxxxxxxxxxxxxx > > > Cc: linux-unionfs@xxxxxxxxxxxxxxx > > > Cc: Miklos Szeredi <miklos@xxxxxxxxxx> > > > Cc: Amir Goldstein <amir73il@xxxxxxxxx> > > > Cc: Vivek Goyal <vgoyal@xxxxxxxxxx> > > > --- > > > Documentation/filesystems/overlayfs.rst | 6 +++++- > > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > > > diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst > > > index 580ab9a0fe31..c6e30c1bc2f2 100644 > > > --- a/Documentation/filesystems/overlayfs.rst > > > +++ b/Documentation/filesystems/overlayfs.rst > > > @@ -570,7 +570,11 @@ Volatile mount > > > This is enabled with the "volatile" mount option. Volatile mounts are not > > > guaranteed to survive a crash. It is strongly recommended that volatile > > > mounts are only used if data written to the overlay can be recreated > > > -without significant effort. > > > +without significant effort. In addition to this, the sync family of syscalls > > > +are not sufficient to determine whether a write failed as sync calls are > > > +omitted. For this reason, it is important that the filesystem used by the > > > +upperdir handles failure in a fashion that's suitable for the user. For > > > +example, upon detecting a fault, ext4 can be configured to panic. > > > > > > > Reading this now, I think I may have wrongly analysed the issue. > > Specifically, when I wrote that the very minimum is to document the > > issue, it was under the assumption that a proper fix is hard. > > I think I was wrong and that the very minimum is to check for errseq > > since mount on the fsync and syncfs calls. > > > > Why? first of all because it is very very simple and goes a long way to > > fix the broken contract with applications, not the contract about durability > > obviously, but the contract about write+fsync+read expects to find the written > > data (during the same mount era). > > > > Second, because the sentence you added above is hard for users to understand > > and out of context. If we properly handle the writeback error in fsync/syncfs, > > then this sentence becomes irrelevant. > > The fact that ext4 can lose data if application did not fsync after > > write is true > > for volatile as well as non-volatile and it is therefore not relevant > > in the context > > of overlayfs documentation at all. > > > > Am I wrong saying that it is very very simple to fix? > > Would you mind making that fix at the bottom of the patch series, so it can > > be easily applied to stable kernels? > > > > Thanks, > > Amir. > > I'm not sure it's so easy. In VFS, there are places where the superblock's > errseq is checked[1]. AFAIK, that's going to check "our" errseq, and not the > errseq of the real corresponding real file's superblock. One solution might be > as part of all these callbacks we set our errseq to the errseq of the filesystem > that the upperdir, and then rely on VFS's checking. > > I'm having a hard time figuring out how to deal with the per-mapping based > error tracking. It seems like this infrastructure was only partially completed > by Jeff Layton[2]. I don't now how it's actually supposed to work right now, > as not all of his patches landed. > The patches in the series were all merged, but we ended up going with a simpler solution [1] than the first series I posted. Instead of plumbing the errseq_t handling down into sync_fs, we did it in the syscall wrapper. I think the tricky part here is that there is no struct file plumbed into ->sync_fs, so you don't have an errseq_t cursor to work with by the time that gets called. What may be easiest is to just propagate the s_wb_err value from the upper_sb to the overlayfs superblock in ovl_sync_fs(). That should get called before the errseq_check_and_advance in the syncfs syscall wrapper and should ensure that syncfs() calls against the overlayfs sb see any errors that occurred on the upper_sb. Something like this maybe? Totally untested of course. May also need to think about how to ensure ordering between racing syncfs calls too (don't really want the s_wb_err going "backward"): ----------------------------8<--------------------------------- $ git diff diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index 290983bcfbb3..d725705abdac 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -283,6 +283,9 @@ static int ovl_sync_fs(struct super_block *sb, int wait) ret = sync_filesystem(upper_sb); up_read(&upper_sb->s_umount); + /* Propagate s_wb_err from upper_sb to overlayfs sb */ + WRITE_ONCE(sb->s_wb_err, READ_ONCE(upper_sb->s_wb_err)); + return ret; } ----------------------------8<--------------------------------- [1]: https://www.spinics.net/lists/linux-api/msg41276.html How about I split this into two patchsets? One, where I add the logic to copy the errseq on callbacks to fsync from the upperdir to the ovl fs superblock, and thus allowing VFS to bubble up errors, and the documentation. We can CC stable on those because I think it has an effect that's universal across all filesystems. P.S. I notice you maintain overlay tests outside of the kernel. Unfortunately, I think for this kind of test, it requires in kernel code to artificially bump the writeback error count on the upperdir, or it requires the failure injection infrastructure. Simulating this behaviour is non-trivial without in-kernel support: P1: Open(f) -> p1.fd P2: Open(f) -> p2.fd P1: syncfs(p1.fd) -> errrno P2: syncfs(p1.fd) -> 0 # This should return an error [1]: https://elixir.bootlin.com/linux/latest/source/fs/sync.c#L175 [2]: https://lwn.net/Articles/722250/ -- Jeff Layton <jlayton@xxxxxxxxxx>