Re: [PATCH v2 4/6] fs: report per-mount io stats

Amir Goldstein <amir73il@xxxxxxxxx> · Tue, 1 Mar 2022 12:56:58 +0200

On Tue, Mar 1, 2022 at 11:46 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> On Mon, Feb 28, 2022 at 11:57:19PM +0200, Amir Goldstein wrote:
> > On Mon, Feb 28, 2022 at 11:11 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > >
> > > On Mon, Feb 28, 2022 at 01:39:08PM +0200, Amir Goldstein wrote:
> > > > Show optional collected per-mount io stats in /proc/<pid>/mountstats
> > > > for filesystems that do not implement their own show_stats() method
> > > > and opted-in to generic per-mount stats with FS_MOUNT_STATS flag.
> > > >
> > > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
> > > > ---
> > > >  fs/mount.h          |  1 +
> > > >  fs/namespace.c      |  2 ++
> > > >  fs/proc_namespace.c | 13 +++++++++++++
> > > >  3 files changed, 16 insertions(+)
> > > >
> > > > diff --git a/fs/mount.h b/fs/mount.h
> > > > index f98bf4cd5b1a..2ab6308af78b 100644
> > > > --- a/fs/mount.h
> > > > +++ b/fs/mount.h
> > > > @@ -91,6 +91,7 @@ struct mount {
> > > >       int mnt_id;                     /* mount identifier */
> > > >       int mnt_group_id;               /* peer group identifier */
> > > >       int mnt_expiry_mark;            /* true if marked for expiry */
> > > > +     time64_t mnt_time;              /* time of mount */
> > > >       struct hlist_head mnt_pins;
> > > >       struct hlist_head mnt_stuck_children;
> > > >  } __randomize_layout;
> > > > diff --git a/fs/namespace.c b/fs/namespace.c
> > > > index 3fb8f11a42a1..546f07ed44c5 100644
> > > > --- a/fs/namespace.c
> > > > +++ b/fs/namespace.c
> > > > @@ -220,6 +220,8 @@ static struct mount *alloc_vfsmnt(const char *name)
> > > >               mnt->mnt_count = 1;
> > > >               mnt->mnt_writers = 0;
> > > >  #endif
> > > > +             /* For proc/<pid>/mountstats */
> > > > +             mnt->mnt_time = ktime_get_seconds();
> > > >
> > > >               INIT_HLIST_NODE(&mnt->mnt_hash);
> > > >               INIT_LIST_HEAD(&mnt->mnt_child);
> > > > diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
> > > > index 49650e54d2f8..d744fb8543f5 100644
> > > > --- a/fs/proc_namespace.c
> > > > +++ b/fs/proc_namespace.c
> > > > @@ -232,6 +232,19 @@ static int show_vfsstat(struct seq_file *m, struct vfsmount *mnt)
> > > >       if (sb->s_op->show_stats) {
> > > >               seq_putc(m, ' ');
> > > >               err = sb->s_op->show_stats(m, mnt_path.dentry);
> > > > +     } else if (mnt_has_stats(mnt)) {
> > > > +             /* Similar to /proc/<pid>/io */
> > > > +             seq_printf(m, "\n"
> > > > +                        "\ttimes: %lld %lld\n"
> > > > +                        "\trchar: %lld\n"
> > > > +                        "\twchar: %lld\n"
> > > > +                        "\tsyscr: %lld\n"
> > > > +                        "\tsyscw: %lld\n",
> > > > +                        r->mnt_time, ktime_get_seconds(),
> > > > +                        mnt_iostats_counter_read(r, MNTIOS_CHARS_RD),
> > > > +                        mnt_iostats_counter_read(r, MNTIOS_CHARS_WR),
> > > > +                        mnt_iostats_counter_read(r, MNTIOS_SYSCALLS_RD),
> > > > +                        mnt_iostats_counter_read(r, MNTIOS_SYSCALLS_WR));
> > >
> > > This doesn't scale as {cpus, mounts, counters, read frequency}
> > > matrix explodes.  Please iterate the per-mount per cpu counters
> > > once, adding up all counters in one pass to an array on stack, then
> > > print them all from the array.
> >
> > I am planning to move to per-sb iostats and was thinking of using
> > an array of 4 struct percpu_counter. That will make this sort of iteration
> > more challenging.
>
> No, it would get rid of it entirely. percpu_counter_read_positive()
> does not require any summing at all - that's a much better solution
> than a hand rolled set of percpu counters. Please do this.
>
> > Do you really think the read frequency of /proc/self/mountstats
> > warrants such performance optimization?
>
> We get bug reports every so often about the overhead of frequently
> summing per-cpu stats on large systems. Nothing ratelimits or
> restricts access to /proc/self/mountstats, so when you have a
> thousand CPUs and a million monkeys...
>
> Rule of thumb: don't do computationally expensive things to generate
> data for globally accessible sysfs files.
>
> > It's not like the case of the mighty struct xfsstats.
> > It is only going to fold 4 per cpu iterations into 1.
> > This doesn't look like a game changer to me.
> > Am I missing something?
>
> I'm just pointing out something we've had problems with in
> the past and are trying to help you avoid making the same mistakes.
> That's what reviewers are supposed to do, yes?

Yes, thank you :)

Amir.