Re: [RFC PATCH 3/3] overlay: Add the ability to remount volatile directories when safe

Amir Goldstein <amir73il@xxxxxxxxx> · Mon, 16 Nov 2020 22:18:03 +0200

On Mon, Nov 16, 2020 at 6:36 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
>
> On Mon, Nov 16, 2020 at 05:20:04PM +0200, Amir Goldstein wrote:
> > On Mon, Nov 16, 2020 at 4:42 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> > >
> > > On Sun, Nov 15, 2020 at 08:57:58PM -0800, Sargun Dhillon wrote:
> > > > Overlayfs added the ability to setup mounts where all syncs could be
> > > > short-circuted in (2a99ddacee43: ovl: provide a mount option "volatile").
> > > >
> > > > A user might want to remount this fs, but we do not let the user because
> > > > of the "incompat" detection feature. In the case of volatile, it is safe
> > > > to do something like[1]:
> > > >
> > > > $ sync -f /root/upperdir
> > > > $ rm -rf /root/workdir/incompat/volatile
> > > >
> > > > There are two ways to go about this. You can call sync on the underlying
> > > > filesystem, check the error code, and delete the dirty file if everything
> > > > is clean. If you're running lots of containers on the same filesystem, or
> > > > you want to avoid all unnecessary I/O, this may be suboptimal.
> > > >
> > >
> > > Hi Sargun,
> > >
> > > I had asked bunch of questions in previous mail thread to be more
> > > clear on your requirements but never got any response. It would
> > > have helped understanding your requirements better.
> > >
> > > How about following patch set which seems to sync only dirty inodes of
> > > upper belonging to a particular overlayfs instance.
> > >
> > > https://lore.kernel.org/linux-unionfs/20201113065555.147276-1-cgxu519@xxxxxxxxxxxx/
> > >
> > > So if could implement a mount option which ignores fsync but upon
> > > syncfs, only syncs dirty inodes of that overlayfs instance, it will
> > > make sure we are not syncing whole of the upper fs. And we could
> > > do this syncing on unmount of overlayfs and remove dirty file upon
> > > successful sync.
> > >
> > > Looks like this will be much simpler method and should be able to
> > > meet your requirements (As long as you are fine with syncing dirty
> > > upper inodes of this overlay instance on unmount).
> > >
> >
> > Do note that the latest patch set by Chengguang not only syncs dirty
> > inodes of this overlay instance, but also waits for in-flight writeback on
> > all the upper fs inodes and I think that with !ovl_should_sync(ofs)
> > we will not re-dirty the ovl inodes and lose track of the list of dirty
> > inodes - maybe that can be fixed.
> >
> > Also, I am not sure anymore that we can safely remove the dirty file after
> > sync dirty inodes sync_fs and umount. If someone did sync_fs before us
> > and consumed the error, we may have a copied up file in upper whose
> > data is not on disk, but when we sync_fs on unmount we won't get an
> > error? not sure.
>
> May be we can save errseq_t when mounting overlay and compare with
> errseq_t stored in upper sb after unmount. That will tell us whether
> error has happened since we mounted overlay. (Similar to what Sargun
> is doing).
>

I suppose so.

> In fact, if this is a concern, we have this issue with user space
> "sync <upper>" too? Other sync might fail and this one succeeds
> and we will think upper is just fine. May be container tools can
> keep a file/dir open at the time of mount and call syncfs using
> that fd instead. (And that should catch errors since that fd
> was opened, I am assuming).
>

Did not understand the problem with userspace sync.

> >
> > I am less concerned about ways to allow re-mount of volatile
> > overlayfs than I am about turning volatile overlayfs into non-volatile.
>
> If we are not interested in converting volatile containers into
> non-volatile, then whole point of these patch series is to detect
> if any writeback error has happened or not. If writeback error has
> happened, then we detect that at remount and possibly throw away
> container.
>
> What happens today if writeback error has happened. Is that page thrown
> away from page cache and read back from disk? IOW, will user lose
> the data it had written in page cache because writeback failed. I am
> assuming we can't keep the dirty page around for very long otherwise
> it has potential to fill up all the available ram with dirty pages which
> can't be written back.
>

Right. the resulting data is undefined after error.

> Why is it important to detect writeback error only during remount. What
> happens if container overlay instance is already mounted and writeback
> error happens. We will not detct that, right?
>
> IOW, if capturing writeback error is important for volatile containers,
> then capturing it only during remount time is not enough. Normally
> fsync/syncfs should catch it and now we have skipped those, so in
> the process we lost mechanism to detect writeback errrors for
> volatile containers?
>

Yes, you are right.
It's an issue with volatile that we should probably document.

I think upper files data can "evaporate" even as the overlay is still mounted.

Thanks,
Amir.