Re: [PATCH v7] overlayfs: Provide a mount option "volatile" to skip sync

Vivek Goyal <vgoyal@xxxxxxxxxx> · Mon, 9 Nov 2020 15:24:31 -0500

On Mon, Nov 09, 2020 at 09:39:59PM +0200, Amir Goldstein wrote:
> On Mon, Nov 9, 2020 at 7:26 PM Sargun Dhillon <sargun@xxxxxxxxx> wrote:
> >
> > On Mon, Nov 9, 2020 at 9:22 AM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> > >
> > > On Fri, Nov 06, 2020 at 09:00:07PM +0200, Amir Goldstein wrote:
> > > > On Fri, Nov 6, 2020 at 7:59 PM Sargun Dhillon <sargun@xxxxxxxxx> wrote:
> > > > >
> > > > > On Mon, Aug 31, 2020 at 11:15 AM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > Container folks are complaining that dnf/yum issues too many sync while
> > > > > > installing packages and this slows down the image build. Build
> > > > > > requirement is such that they don't care if a node goes down while
> > > > > > build was still going on. In that case, they will simply throw away
> > > > > > unfinished layer and start new build. So they don't care about syncing
> > > > > > intermediate state to the disk and hence don't want to pay the price
> > > > > > associated with sync.
> > > > > >
> > > > > > So they are asking for mount options where they can disable sync on overlay
> > > > > > mount point.
> > > > > >
> > > > > > They primarily seem to have two use cases.
> > > > > >
> > > > > > - For building images, they will mount overlay with nosync and then sync
> > > > > >   upper layer after unmounting overlay and reuse upper as lower for next
> > > > > >   layer.
> > > > > >
> > > > > > - For running containers, they don't seem to care about syncing upper
> > > > > >   layer because if node goes down, they will simply throw away upper
> > > > > >   layer and create a fresh one.
> > > > > >
> > > > > > So this patch provides a mount option "volatile" which disables all forms
> > > > > > of sync. Now it is caller's responsibility to throw away upper if
> > > > > > system crashes or shuts down and start fresh.
> > > > > >
> > > > > > With "volatile", I am seeing roughly 20% speed up in my VM where I am just
> > > > > > installing emacs in an image. Installation time drops from 31 seconds to
> > > > > > 25 seconds when nosync option is used. This is for the case of building on top
> > > > > > of an image where all packages are already cached. That way I take
> > > > > > out the network operations latency out of the measurement.
> > > > > >
> > > > > > Giuseppe is also looking to cut down on number of iops done on the
> > > > > > disk. He is complaining that often in cloud their VMs are throttled
> > > > > > if they cross the limit. This option can help them where they reduce
> > > > > > number of iops (by cutting down on frequent sync and writebacks).
> > > > > >
> > > > [...]
> > > > > There is some slightly confusing behaviour here [I realize this
> > > > > behaviour is as intended]:
> > > > >
> > > > > (root) ~ # mount -t overlay -o
> > > > > volatile,index=off,lowerdir=/root/lowerdir,upperdir=/root/upperdir,workdir=/root/workdir
> > > > > none /mnt/foo
> > > > > (root) ~ # umount /mnt/foo
> > > > > (root) ~ # mount -t overlay -o
> > > > > volatile,index=off,lowerdir=/root/lowerdir,upperdir=/root/upperdir,workdir=/root/workdir
> > > > > none /mnt/foo
> > > > > mount: /mnt/foo: wrong fs type, bad option, bad superblock on none,
> > > > > missing codepage or helper program, or other error.
> > > > >
> > > > > From my understanding, the dirty flag should only be a problem if the
> > > > > existing overlayfs is unmounted uncleanly. Docker does
> > > > > this (mount, and re-mounts) during startup time because it writes some
> > > > > files to the overlayfs. I think that we should harden
> > > > > the volatile check slightly, and make it so that within the same boot,
> > > > > it's not a problem, and having to have the user clear
> > > > > the workdir every time is a pain. In addition, the semantics of the
> > > > > volatile patch itself do not appear to be such that they
> > > > > would break mounts during the same boot / mount of upperdir -- as
> > > > > overlayfs does not defer any writes in itself, and it's
> > > > > only that it's short-circuiting writes to the upperdir.
> > > > >
> > > > > Amir,
> > > > > What do you think?
> > > >
> > > > How do you propose to check that upperdir was used during the same boot?
> > >
> > > Can we read and store "/proc/sys/kernel/random/boot_id". I am assuming
> > > this will change if system is rebooting after a shutdown/reboot/crash.
> > >
> > > If boot_id has not changed, we can allow remount and delete incomapt
> > > dir ourseleves. May be we can drop a file in incomat to store boot_id
> > > at the time of overlay mount.
> > >
> > > Thanks
> > > Vivek
> > >
> >
> > Storing boot_id is not good enough. You need to store the identity of the
> > superblock, because remounts can occur. Also, if errors happen
> > after flushing pages through writeback, they may never have been reported
> > to the user, so we need to see if those happened as well.
> 
> It is not clear to me what problem we are trying to solve.
> What is wrong with the userspace option to remove the dirty file?
> 
> Docker has to be changed anyway to use the 'volatile' mount option,
> right?

Is it about detecting any writeback error on remount (which might
have happened after umount of volatile).

But that should be doable in user space too. That is when syncfs
is issued on upper/ it should return error if something failed.

Havind said that, I guess, Sargun does not want to issue sync on
upper due to its affect on other containers latencies. He probably
wants normal writeback and if there is an error in that writeback,
detect that error upon next mount of upper/. And all this is
detected by keeping a track of upper's super block id and erreseq_t
somewhere in overlay.

I have not looked at this patch yet, just guessing...

Vivek