Re: [PATCH v2 1/4] erofs: add file-backed mount support

Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> · Mon, 30 Sep 2024 16:22:47 +0200

Hi Jan,

On Mon, Sep 30, 2024 at 4:18 PM Jan Kara <jack@xxxxxxx> wrote:
> On Tue 24-09-24 11:21:59, Geert Uytterhoeven wrote:
> > On Fri, Aug 30, 2024 at 5:29 AM Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> wrote:
> > > It actually has been around for years: For containers and other sandbox
> > > use cases, there will be thousands (and even more) of authenticated
> > > (sub)images running on the same host, unlike OS images.
> > >
> > > Of course, all scenarios can use the same EROFS on-disk format, but
> > > bdev-backed mounts just work well for OS images since golden data is
> > > dumped into real block devices.  However, it's somewhat hard for
> > > container runtimes to manage and isolate so many unnecessary virtual
> > > block devices safely and efficiently [1]: they just look like a burden
> > > to orchestrators and file-backed mounts are preferred indeed.  There
> > > were already enough attempts such as Incremental FS, the original
> > > ComposeFS and PuzzleFS acting in the same way for immutable fses.  As
> > > for current EROFS users, ComposeFS, containerd and Android APEXs will
> > > be directly benefited from it.
> > >
> > > On the other hand, previous experimental feature "erofs over fscache"
> > > was once also intended to provide a similar solution (inspired by
> > > Incremental FS discussion [2]), but the following facts show file-backed
> > > mounts will be a better approach:
> > >  - Fscache infrastructure has recently been moved into new Netfslib
> > >    which is an unexpected dependency to EROFS really, although it
> > >    originally claims "it could be used for caching other things such as
> > >    ISO9660 filesystems too." [3]
> > >
> > >  - It takes an unexpectedly long time to upstream Fscache/Cachefiles
> > >    enhancements.  For example, the failover feature took more than
> > >    one year, and the deamonless feature is still far behind now;
> > >
> > >  - Ongoing HSM "fanotify pre-content hooks" [4] together with this will
> > >    perfectly supersede "erofs over fscache" in a simpler way since
> > >    developers (mainly containerd folks) could leverage their existing
> > >    caching mechanism entirely in userspace instead of strictly following
> > >    the predefined in-kernel caching tree hierarchy.
> > >
> > > After "fanotify pre-content hooks" lands upstream to provide the same
> > > functionality, "erofs over fscache" will be removed then (as an EROFS
> > > internal improvement and EROFS will not have to bother with on-demand
> > > fetching and/or caching improvements anymore.)
> > >
> > > [1] https://github.com/containers/storage/pull/2039
> > > [2] https://lore.kernel.org/r/CAOQ4uxjbVxnubaPjVaGYiSwoGDTdpWbB=w_AeM6YM=zVixsUfQ@xxxxxxxxxxxxxx
> > > [3] https://docs.kernel.org/filesystems/caching/fscache.html
> > > [4] https://lore.kernel.org/r/cover.1723670362.git.josef@xxxxxxxxxxxxxx
> > >
> > > Closes: https://github.com/containers/composefs/issues/144
> > > Signed-off-by: Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx>
> >
> > Thanks for your patch, which is now commit fb176750266a3d7f
> > ("erofs: add file-backed mount support").
> >
> > > ---
> > > v2:
> > >  - should use kill_anon_super();
> > >  - add O_LARGEFILE to support large files.
> > >
> > >  fs/erofs/Kconfig    | 17 ++++++++++
> > >  fs/erofs/data.c     | 35 ++++++++++++---------
> > >  fs/erofs/inode.c    |  5 ++-
> > >  fs/erofs/internal.h | 11 +++++--
> > >  fs/erofs/super.c    | 76 +++++++++++++++++++++++++++++----------------
> > >  5 files changed, 100 insertions(+), 44 deletions(-)
> > >
> > > diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig
> > > index 7dcdce660cac..1428d0530e1c 100644
> > > --- a/fs/erofs/Kconfig
> > > +++ b/fs/erofs/Kconfig
> > > @@ -74,6 +74,23 @@ config EROFS_FS_SECURITY
> > >
> > >           If you are not using a security module, say N.
> > >
> > > +config EROFS_FS_BACKED_BY_FILE
> > > +       bool "File-backed EROFS filesystem support"
> > > +       depends on EROFS_FS
> > > +       default y
> >
> > I am a bit reluctant to have this default to y, without an ack from
> > the VFS maintainers.
>
> Well, we generally let filesystems do whatever they decide to do unless it
> is a affecting stability / security / maintainability of the whole system.
> In this case I don't see anything that would be substantially different
> than if we go through a loop device. So although the feature looks somewhat
> unusual I don't see a reason to nack it or otherwise interfere with
> whatever the fs maintainer wants to do. Are you concerned about a
> particular problem?

I was just wondering if there are any issues with accessing files directly.
If you're fine with it, I am, too.
Thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds