Re: [PATCH v7] overlayfs: Provide a mount option "volatile" to skip sync

Vivek Goyal <vgoyal@xxxxxxxxxx> · Fri, 6 Nov 2020 14:20:24 -0500

On Fri, Nov 06, 2020 at 09:00:07PM +0200, Amir Goldstein wrote:
> On Fri, Nov 6, 2020 at 7:59 PM Sargun Dhillon <sargun@xxxxxxxxx> wrote:
> >
> > On Mon, Aug 31, 2020 at 11:15 AM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> > >
> > > Container folks are complaining that dnf/yum issues too many sync while
> > > installing packages and this slows down the image build. Build
> > > requirement is such that they don't care if a node goes down while
> > > build was still going on. In that case, they will simply throw away
> > > unfinished layer and start new build. So they don't care about syncing
> > > intermediate state to the disk and hence don't want to pay the price
> > > associated with sync.
> > >
> > > So they are asking for mount options where they can disable sync on overlay
> > > mount point.
> > >
> > > They primarily seem to have two use cases.
> > >
> > > - For building images, they will mount overlay with nosync and then sync
> > >   upper layer after unmounting overlay and reuse upper as lower for next
> > >   layer.
> > >
> > > - For running containers, they don't seem to care about syncing upper
> > >   layer because if node goes down, they will simply throw away upper
> > >   layer and create a fresh one.
> > >
> > > So this patch provides a mount option "volatile" which disables all forms
> > > of sync. Now it is caller's responsibility to throw away upper if
> > > system crashes or shuts down and start fresh.
> > >
> > > With "volatile", I am seeing roughly 20% speed up in my VM where I am just
> > > installing emacs in an image. Installation time drops from 31 seconds to
> > > 25 seconds when nosync option is used. This is for the case of building on top
> > > of an image where all packages are already cached. That way I take
> > > out the network operations latency out of the measurement.
> > >
> > > Giuseppe is also looking to cut down on number of iops done on the
> > > disk. He is complaining that often in cloud their VMs are throttled
> > > if they cross the limit. This option can help them where they reduce
> > > number of iops (by cutting down on frequent sync and writebacks).
> > >
> [...]
> > There is some slightly confusing behaviour here [I realize this
> > behaviour is as intended]:
> >
> > (root) ~ # mount -t overlay -o
> > volatile,index=off,lowerdir=/root/lowerdir,upperdir=/root/upperdir,workdir=/root/workdir
> > none /mnt/foo
> > (root) ~ # umount /mnt/foo
> > (root) ~ # mount -t overlay -o
> > volatile,index=off,lowerdir=/root/lowerdir,upperdir=/root/upperdir,workdir=/root/workdir
> > none /mnt/foo
> > mount: /mnt/foo: wrong fs type, bad option, bad superblock on none,
> > missing codepage or helper program, or other error.
> >
> > From my understanding, the dirty flag should only be a problem if the
> > existing overlayfs is unmounted uncleanly. Docker does
> > this (mount, and re-mounts) during startup time because it writes some
> > files to the overlayfs. I think that we should harden
> > the volatile check slightly, and make it so that within the same boot,
> > it's not a problem, and having to have the user clear
> > the workdir every time is a pain. In addition, the semantics of the
> > volatile patch itself do not appear to be such that they
> > would break mounts during the same boot / mount of upperdir -- as
> > overlayfs does not defer any writes in itself, and it's
> > only that it's short-circuiting writes to the upperdir.
> >
> > Amir,
> > What do you think?
> 
> How do you propose to check that upperdir was used during the same boot?
> 
> Maybe a simpler check  is that upperdir inode is still in cache as an easy way
> around this.
> 
> Add an overlayfs specific inode flag, similar to I_OVL_INUSE
> I_OVL_WAS_INUSE.

So this works only if inode has not been evicted. That means sometimes
it will work and other times it will error out. If that's the case
user has to write code to deal with the error anyway and does not
make life simpler.

Mayh be sync=fs was middle ground option where we ignore fsync() but still
do filesystem sync. And there we will do a sync of upper on umount and
then can remote this incompat directory.

https://lore.kernel.org/linux-unionfs/20200701215029.GF369085@xxxxxxxxxx/

Thanks
Vivek