Re: [PATCH v4 0/9] Generic per-sb io stats

Amir Goldstein <amir73il@xxxxxxxxx> · Sun, 6 Mar 2022 09:55:49 +0200

On Sun, Mar 6, 2022 at 6:18 AM Theodore Ts'o <tytso@xxxxxxx> wrote:
>
> On Sat, Mar 05, 2022 at 06:04:15PM +0200, Amir Goldstein wrote:
> >
> > Dave Chinner asked why the io stats should not be enabled for all
> > filesystems.  That change seems too bold for me so instead, I included
> > an extra patch to auto-enable per-sb io stats for blockdev filesystems.
>
> Perhaps something to consider is allowing users to be able to enable
> or disable I/O stats on per mount basis?
>
> Consider if a potential future user of this feature has servers with
> one or two 256-core AMD Epyc chip, and suppose that they have a
> several thousand iSCSI mounted file systems containing various
> software packages for use by Kubernetes jobs.  (Or even several
> thousand mounted overlay file systems.....)
>
> The size of the percpu counter is going to be *big* on a large CPU
> count machine, and the iostats structure has 5 of these per-cpu
> counters, so if you have one for every single mounted file system,
> even if the CPU slowdown isn't significant, the non-swappable kernel
> memory overhead might be quite large.
>
> So maybe a VFS-level mount option, say, "iostats" and "noiostats", and
> some kind of global option indicating whether the default should be
> iostats being enabled or disabled?  Bonus points if iostats can be
> enabled or disabled after the initial mount via remount operation.
>
> I could imagine some people only being interested to enable iostats on
> certain file systems, or certain classes of block devices --- so they
> might want it enabled on some ext4 file systems which are attached to
> physical devices, but not on the N thousand iSCSI or nbd mounts that
> are also happen to be using ext4.
>

Those were my thoughts as well.

As a matter of fact, I started to have a go at implementing
"iostats"/"noiostats"
and then I realized I have no clue how the designers of the new mount option
parser API intended that new generic mount options like these would be added,
so I ended up reusing SB_MAND_LOCK for the test patch.

Was I supposed to extend struct fs_context fields sb_flags/sb_flags_mask to
unsigned long and add new common SB_ flags to high 32 bits, which can only
be set via fsopen()/fsconfig() on a 64bit arch?

Or did the designers have something completely different in mind?

Perhaps the scope of the new mount API was never to deal with running out of
space for common SB_ flags?

Thanks,
Amir.