Re: [PATCH 1/1] mke2fs: the -d option can now handle tarball input

Andreas Dilger <adilger@xxxxxxxxx> · Thu, 20 Jul 2023 17:53:56 -0600

On Jul 11, 2023, at 5:53 PM, Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
> 
> On Mon, Jul 03, 2023 at 07:43:56AM +0200, Johannes Schauer Marin Rodrigues wrote:
>> 
>> Quoting Darrick J. Wong (2023-06-30 17:51:28)
>>> On Tue, Jun 20, 2023 at 02:16:41PM +0200, Johannes Schauer Marin Rodrigues wrote:
>>>> If archive.h is available during compilation, enable mke2fs to read a
>>>> tarball as input. Since libarchive.so.13 is opened with dlopen,
>>>> libarchive is not a hard library dependency of the resulting binary.
>>> 
>>> I can't say I'm in favor of adding build dependencies to e2fsprogs,
>>> since the point of -d taking a directory arg was to *avoid* having to
>>> understand anything other than posix(ish) directory tree walking APIs.
>> 
>> this is why the build dependency is optional.
> 
> As Ted said elsewhere, the big question is (a) do we really want
> e2fsprogs depending on libarchive at all, and (b) is libarchive's API
> stable enough that you'll maintain it for us?  Merging this patch *is*
> adding to the complexity of what most distros consider to be critical
> system utility.

FWIW, I've been looking at using ext4 filesystem images as random-access
archive files that can be directly mounted and used, rather than having
application workflows untar many small files into the filesystem, read
them for a short time, and then delete them again.  That is especially
important for workflows like AI/ML or genomics that are only reading a
subset of the files on each pass.

Storing all of the small files into an ext4 image would allow it to be
loopback mounted and accessed directly without the added write/unlink
overhead for each job.

Being able to import a tarball (or zipfile, or whatever libarchive allows)
directly into an ext4 image would be super convenient for this, so I'd
be in favor of including this functionality into mke2fs.  I hadn't even
though of this aspect of the workflow, but it would certainly simplify
things.

>> It should be perfectly possible to build e2fsprogs without libarchive
>> as well. I copied the pattern that was already implemented for libmagic
>> which is also not a hard dependency but gets dlopened-ed at runtime.
>> If this mechanism is fine for libmagic it should be fine for others, no?

IMHO, yes, if the code is sufficiently isolated and doesn't cause much
ongoing maintenance effort.

Having just looked through the patch, I think it could use some cleanup.
Basic code style issues:
- wrapping lines at 80-columns
- avoid use of C++ comments in the code
- split large highly-indented code blocks into helper functions
- consistent indentation for continued lines
- consistent one blank line between functions
- consistent one blank line after variable declarations

In terms of code structure, refactoring it to put libarchive handling
in a separate file  (e.g. mke2fs-archive.c or similar) would also make
the maintenance easier, since it can be added/removed from the build
more easily, and (if necessary) removed from the tree if it is no longer
working.

Then have only a couple of small function calls in the main mke2fs.c
code that are accessing the libarchive functionality if it is built-in,
or being no-ops (or just printing the error message) if libarchive is
unavailable.

Cheers, Andreas

Attachment:
signature.asc

Description: Message signed with OpenPGP