On Mon, Feb 28, 2022 at 11:57:19PM +0200, Amir Goldstein wrote: > On Mon, Feb 28, 2022 at 11:11 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > On Mon, Feb 28, 2022 at 01:39:08PM +0200, Amir Goldstein wrote: > > > Show optional collected per-mount io stats in /proc/<pid>/mountstats > > > for filesystems that do not implement their own show_stats() method > > > and opted-in to generic per-mount stats with FS_MOUNT_STATS flag. > > > > > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > > > --- > > > fs/mount.h | 1 + > > > fs/namespace.c | 2 ++ > > > fs/proc_namespace.c | 13 +++++++++++++ > > > 3 files changed, 16 insertions(+) > > > > > > diff --git a/fs/mount.h b/fs/mount.h > > > index f98bf4cd5b1a..2ab6308af78b 100644 > > > --- a/fs/mount.h > > > +++ b/fs/mount.h > > > @@ -91,6 +91,7 @@ struct mount { > > > int mnt_id; /* mount identifier */ > > > int mnt_group_id; /* peer group identifier */ > > > int mnt_expiry_mark; /* true if marked for expiry */ > > > + time64_t mnt_time; /* time of mount */ > > > struct hlist_head mnt_pins; > > > struct hlist_head mnt_stuck_children; > > > } __randomize_layout; > > > diff --git a/fs/namespace.c b/fs/namespace.c > > > index 3fb8f11a42a1..546f07ed44c5 100644 > > > --- a/fs/namespace.c > > > +++ b/fs/namespace.c > > > @@ -220,6 +220,8 @@ static struct mount *alloc_vfsmnt(const char *name) > > > mnt->mnt_count = 1; > > > mnt->mnt_writers = 0; > > > #endif > > > + /* For proc/<pid>/mountstats */ > > > + mnt->mnt_time = ktime_get_seconds(); > > > > > > INIT_HLIST_NODE(&mnt->mnt_hash); > > > INIT_LIST_HEAD(&mnt->mnt_child); > > > diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c > > > index 49650e54d2f8..d744fb8543f5 100644 > > > --- a/fs/proc_namespace.c > > > +++ b/fs/proc_namespace.c > > > @@ -232,6 +232,19 @@ static int show_vfsstat(struct seq_file *m, struct vfsmount *mnt) > > > if (sb->s_op->show_stats) { > > > seq_putc(m, ' '); > > > err = sb->s_op->show_stats(m, mnt_path.dentry); > > > + } else if (mnt_has_stats(mnt)) { > > > + /* Similar to /proc/<pid>/io */ > > > + seq_printf(m, "\n" > > > + "\ttimes: %lld %lld\n" > > > + "\trchar: %lld\n" > > > + "\twchar: %lld\n" > > > + "\tsyscr: %lld\n" > > > + "\tsyscw: %lld\n", > > > + r->mnt_time, ktime_get_seconds(), > > > + mnt_iostats_counter_read(r, MNTIOS_CHARS_RD), > > > + mnt_iostats_counter_read(r, MNTIOS_CHARS_WR), > > > + mnt_iostats_counter_read(r, MNTIOS_SYSCALLS_RD), > > > + mnt_iostats_counter_read(r, MNTIOS_SYSCALLS_WR)); > > > > This doesn't scale as {cpus, mounts, counters, read frequency} > > matrix explodes. Please iterate the per-mount per cpu counters > > once, adding up all counters in one pass to an array on stack, then > > print them all from the array. > > I am planning to move to per-sb iostats and was thinking of using > an array of 4 struct percpu_counter. That will make this sort of iteration > more challenging. No, it would get rid of it entirely. percpu_counter_read_positive() does not require any summing at all - that's a much better solution than a hand rolled set of percpu counters. Please do this. > Do you really think the read frequency of /proc/self/mountstats > warrants such performance optimization? We get bug reports every so often about the overhead of frequently summing per-cpu stats on large systems. Nothing ratelimits or restricts access to /proc/self/mountstats, so when you have a thousand CPUs and a million monkeys... Rule of thumb: don't do computationally expensive things to generate data for globally accessible sysfs files. > It's not like the case of the mighty struct xfsstats. > It is only going to fold 4 per cpu iterations into 1. > This doesn't look like a game changer to me. > Am I missing something? I'm just pointing out something we've had problems with in the past and are trying to help you avoid making the same mistakes. That's what reviewers are supposed to do, yes? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx