Re: [proposal] making filesystem tools more machine friendly

"Theodore Ts'o" <tytso@xxxxxxx> · Mon, 3 Jul 2017 11:07:22 -0400

On Mon, Jul 03, 2017 at 01:52:49PM +0200, Jan Tulak wrote:
> I want to limit the capabilities of this interface to non-interactive
> only. So, yes, with fsck, JSON would be overkill. But the idea is to
> have a single format across all the tools, so you don't need a
> standalone parser for every tool, even if some tools don't need
> anything more than an exit code and/or one message on stderr. In the
> case of ext2/3/4 it is more about resize2fs and tune2fs, where the
> JSON would be much more useful, than fsck.

I'm not sure what sort use cases you have in mind where structured
output would be useful.

For mke2fs or resize2fs, in general all you care about is "did the
operation succeed", right?  What did you have in mind where this more
information than "the operation was successful" did you have in mind?

For tune2fs, what options or output in tune2fs are sufficiently file
system independent that you think it would be worth exporting to your
infrastructure?

> The generic way would be along the lines: These ten common and
> frequently used fields are generic and work everywhere, anything else
> has a prefix (ext4-, xfs-, btrfs-, ...) or is inside of a fs-specific
> list of extensions "ext4":{"some-option":value}. And during the
> parsing, all the fields would be mapped to some mkfs arguments,
> usually 1:1. Similar to what could be done with output, like putting
> volume identifier into a specific field (e.g. "fsid") no matter what
> the filesystem is.

It might be useful if you could give some "user stories" that explain
at a high level what the user might want to do with the ultimate
user-visible interface that would require this kind of precision.

Most users don't know how to use the specialized options to
mkfs.<FSTYP> and to be honest, most don't need to.  The way I've dealt
with this for mke2fs.conf is that when someone has come up with a
specialized recipe for a unique sort of file system type --- say,
maybe for a Lustre Metadata server, or the back-end storage for some
kind of clustre file system like Hadoopfs, or specialized options for
an Android phone --- someone with wizard-level skills will edit
/etc/mke2fs.conf, with perhaps something like this:

    smr-host-managed = {
        features = extent,huge_file,bigalloc,flex_bg,uninit_bg,dir_nlink,extra_isize,^resize_inode,sparse_super2
        cluster_size = 32768
        hash_alg = half_md4
        reserved_ratio = 0.0
        num_backup_sb = 0
        packed_meta_blocks = 1
        make_hugefiles = 1
        inode_ratio = 4194304
        hugefiles_dir = /smr
        hugefiles_name = smr-file
        hugefiles_digits = 0
        hugefiles_size = 0
        hugefiles_align = 256M
        hugefiles_align_disk = true
        num_hugefiles = 1
        zero_hugefiles = false
	flex_bg_size = 262144
    }

... and then all the user will have to do is run "mke2fs -t ext4 -T
smr-host-managed /dev/sdXX".  (The intended use case for this is to
support an SMR-aware user-space application that was going to be
managing the host-managed SMR zones directly.)

I suspect telling the user that they have to type a whole series of
parameters of the form:

	"ext4":{"some-option":value}

into a GUI would not be a particularly user-friendly suggestion.  :-)

The other question I'd ask is how many people really are going to want
to use your infrastructure?  Is it only going to be for the "point and
click" users who will want a simplified interface?  Are you trying to
make something that will be useful for advanced/expert users?  What
value are you going to be able to add that will convince the
advanced/export users that they should learn your new
"fstype":{"some-option":value} syntax when typing at the command line
will probably be ten times faster, easier, and less rage-inducing than
trying to reverse-engineer out some interface that was designed not to
scare the civilians?  (There's a reason why many drivers prefer manual
to automatic transmission on their cars.  :-)

> Thanks for pointing out EVMS, I will see what I can learn from that
> attempt. Starting with screenscraping is certainly an option and might
> be the only viable one. This raises some other questions, though:
> given the temporality of the wrapper, I would rather use other
> languages than C/Bash (e.g. Python) to simplify and speed up the
> development. But I have doubts about whether you would be willing to
> adopt this into e2fsprogs, which would, in turn, reduce the usability
> of this approach.

The reason why you might want to consider C is because:

   * It allows the plugin to be imported into many different
     programming languages: Python, Go, Perl, etc., via using
     something like SWIG.

   * Different file system maintainers will be willing to accept
     maintenance of your plugin at different times.  For some file
     systems, you may have to wrap the command-line tools forever; for
     one thing the file system may no longer be under active
     maintenance (ex: iso9660) but you still might want to be use it
     in your GUI interface.  Other file system developers will be
     willing to take over the plugin and support it as a native part
     of their file system tools more quickly.

   * The first operations that you might want to make be native
     instead of being screen scraped (getting the file system size,
     the amount of free space, etc.)  are things which are most easily
     done in C.

     So if you want ask the file system developers to take over the
     plugin, they are much more likely to be willing to say yes if the
     plugin is already in C, as opposed to asking them to take over
     some Python class where trying to integrate python code to call
     into libext2fs is going to be a pain in the ass.  For that
     matter, you might want to implement the plugins to call libext2fs
     and libxfs directly for those basic functions.  That's what the
     EVMS developers did, and those interfaces in libext2fs are
     guaranteed to have ABI and API stability.  Anyway, if your goal
     is to convince file system developers to eventually take
     ownership of the interface/plugin module, it will be much easier
     to do that if it is in C --- trust me on that.

   * I don't think it's going to be that hard to use C; as I've said,
     I really disbelieve that there are that many places where you
     need to screenscrape.  Most of what you will probably need to do
     is to return the exit status of mke2fs, fsck, resize2fs, etc.
     Those programs that you do need to screen scrape will have
     outputs similar to dumpe2fs, which is stupid-easy to parse, and
     are also, as I've noted above, the simplest thing to move to
     being done in native code calling the file system's C library.

Oh, one thing.  I'll note that e2fsprogs has progress bar support
already, and it was designed so it could be easily integrated into a
GUI.  As far as I know Ubuntu was the only distro that used it ---
progress bars tend not to be high on most distros' product manager's
feature priority lists --- but it's there.  See e2fsck's -C option.
This support was also plumbed into fsck (see its -C option), so I
designed it to be something that other file systems could implement.

Also, we're Linux systems programmers, not Web developers writing
Javascript; why use JSON and require fancy parsing when you can just
isolate the completion information onto a separate file descriptor?  I
don't know how big and complex your JSON parsing library is, but all
that was needed to parse *my* completion information is the single
line of C code:

	fscanf(progress_f, "%d %lu %lu %ms\n", &pass, &cur, &max, &text).

Cheers,

						- Ted