Re: [PATCH v3 0/6] Generic per-sb io stats

Amir Goldstein <amir73il@xxxxxxxxx> · Wed, 2 Mar 2022 18:59:51 +0200

On Wed, Mar 2, 2022 at 10:27 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> On Wed, Mar 02, 2022 at 09:43:50AM +0200, Amir Goldstein wrote:
> > On Wed, Mar 2, 2022 at 8:59 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > >
> > > On Tue, Mar 01, 2022 at 08:42:15PM +0200, Amir Goldstein wrote:
> > > > Miklos,
> > > >
> > > > Following your feedback on v2 [1], I moved the iostats to per-sb.
> > > >
> > > > Thanks,
> > > > Amir.
> > > >
> > > > [1] https://lore.kernel.org/linux-unionfs/20220228113910.1727819-1-amir73il@xxxxxxxxx/
> > > >
> > > > Changes since v2:
> > > > - Change from per-mount to per-sb io stats (szeredi)
> > > > - Avoid percpu loop when reading mountstats (dchinner)
> > > >
> > > > Changes since v1:
> > > > - Opt-in for per-mount io stats for overlayfs and fuse
> > >
> > > Why make it optional only for specific filesystem types? Shouldn't
> > > every superblock capture these stats and export them in exactly the
> > > same place?
> > >
> >
> > I am not sure what you are asking.
> >
> > Any filesystem can opt-in to get those generic io stats.
>
> Yes, but why even make it opt-in? Why not just set these up
> unconditionally in alloc_super() for all filesystems? Either this is
> useful information for userspace montioring and diagnostics, or it's
> not useful at all. If it is useful, then all superblocks should
> export this stuff rather than just some special subset of
> filesystems where individual maintainers have noticed it and thought
> "that might be useful".
>
> Just enable it for every superblock....
>

Not that simple.
First of all alloc_super() is used for all sorts of internal kernel sb
(e.g. pipes) that really don't need those stats.

Second, counters can have performance impact.
Depending on the fs, overhead may or may not be significant.
I used the attached patch for testing and ran some micro benchmarks
on tmpfs (10M small read/writes).
The patch hacks -omand for enabling iostats [*]

The results were not great. up to 20% slower when io size > default
batch size (32).
Increasing the counter batch size for rchar/wchar to 1K fixed this
micro benchmark,
but it may not be a one size fits all situation.
So I'd rather be cautious and not enable the feature unconditionally.

Miklos,

In the patches to enable per-sb iostats for fuse and overlayfs I argued why
I think that iostats are more important for fuse/overlayfs than for
other local fs.

Do you agree with my arguments for either fuse/overlayfs or both?
If so, would you rather that per-sb iostats be enabled unconditionally for
fuse/overlayfs or via an opt-in mount option?

Thanks,
Amir.

[*] I tried to figure out how fsconfig() could be used to implement a new
generic sb property (i.e. SB_IOSTATS), but I failed to understand if and how
the new mount API design is intended to address new generic properties, so
I resorted to the SB_MANDLOCK hack.
From 669d899a9e40c2c519a850a65e5001f2b03b05a8 Mon Sep 17 00:00:00 2001
From: Amir Goldstein <amir73il@xxxxxxxxx>
Date: Wed, 2 Mar 2022 11:03:29 +0200
Subject: [PATCH v3 7/6] fs: enable per-sb I/O stats for any filesystem

Override SB_MANDLOCK for testing.

Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
---
 fs/fuse/inode.c            |  8 ++++----
 fs/namespace.c             |  2 ++
 fs/overlayfs/super.c       |  4 ----
 fs/super.c                 |  3 +++
 include/linux/fs_iostats.h | 30 ++++++++++++++++++++++++++++++
 5 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index f19c666b9ac3..4f0316709df0 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -145,8 +145,10 @@ static int fuse_reconfigure(struct fs_context *fsc)
 	struct super_block *sb = fsc->root->d_sb;
 
 	sync_filesystem(sb);
+#ifndef CONFIG_FS_IOSTATS
 	if (fsc->sb_flags & SB_MANDLOCK)
 		return -EINVAL;
+#endif
 
 	return 0;
 }
@@ -1514,13 +1516,11 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 	struct dentry *root_dentry;
 	int err;
 
+#ifndef CONFIG_FS_IOSTATS
 	err = -EINVAL;
 	if (sb->s_flags & SB_MANDLOCK)
 		goto err;
-
-	err = sb_iostats_init(sb);
-	if (err && err != -EOPNOTSUPP)
-		goto err;
+#endif
 
 	rcu_assign_pointer(fc->curr_bucket, fuse_sync_bucket_alloc());
 	fuse_sb_defaults(sb);
diff --git a/fs/namespace.c b/fs/namespace.c
index de6fae84f1a1..eb4ebb5081d9 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3298,8 +3298,10 @@ int path_mount(const char *dev_name, struct path *path,
 		return ret;
 	if (!may_mount())
 		return -EPERM;
+#ifndef CONFIG_FS_IOSTATS
 	if (flags & SB_MANDLOCK)
 		warn_mandlock();
+#endif
 
 	/* Default to relatime unless overriden */
 	if (!(flags & MS_NOATIME))
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 94ab6adebe07..d2f569260edc 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1980,10 +1980,6 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 
 	sb->s_d_op = &ovl_dentry_operations;
 
-	err = sb_iostats_init(sb);
-	if (err && err != -EOPNOTSUPP)
-		goto out;
-
 	err = -ENOMEM;
 	ofs = kzalloc(sizeof(struct ovl_fs), GFP_KERNEL);
 	if (!ofs)
diff --git a/fs/super.c b/fs/super.c
index c447cadb402b..ebaf650f72bf 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -180,6 +180,7 @@ static void destroy_unused_super(struct super_block *s)
 	up_write(&s->s_umount);
 	list_lru_destroy(&s->s_dentry_lru);
 	list_lru_destroy(&s->s_inode_lru);
+	sb_iostats_destroy(s);
 	security_sb_free(s);
 	put_user_ns(s->s_user_ns);
 	kfree(s->s_subtype);
@@ -1522,6 +1523,8 @@ int vfs_get_tree(struct fs_context *fc)
 	sb->s_flags |= SB_BORN;
 
 	error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
+	if (!error)
+		error = sb_iostats_configure(sb);
 	if (unlikely(error)) {
 		fc_drop_locked(fc);
 		return error;
diff --git a/include/linux/fs_iostats.h b/include/linux/fs_iostats.h
index 60d1efbea7d9..af5b5ff201aa 100644
--- a/include/linux/fs_iostats.h
+++ b/include/linux/fs_iostats.h
@@ -6,6 +6,13 @@
 #include <linux/percpu_counter.h>
 #include <linux/slab.h>
 
+/*
+ * Override SB_MANDLOCK for testing.
+ *
+ * TODO: use fsopen()/fsconfig() flag parameters?
+ */
+#define SB_IOSTATS	SB_MANDLOCK
+
 /* Similar to task_io_accounting members */
 enum {
 	SB_IOSTATS_CHARS_RD,	/* bytes read via syscalls */
@@ -47,6 +54,10 @@ static inline int sb_iostats_init(struct super_block *sb)
 {
 	int err;
 
+	/* Already initialized? */
+	if (sb->s_iostats)
+		return 0;
+
 	sb->s_iostats = kmalloc(sizeof(struct sb_iostats), GFP_KERNEL);
 	if (!sb->s_iostats)
 		return -ENOMEM;
@@ -60,6 +71,19 @@ static inline int sb_iostats_init(struct super_block *sb)
 	}
 
 	sb->s_iostats->start_time = ktime_get_seconds();
+	sb->s_flags |= SB_IOSTATS;
+	return 0;
+}
+
+/* Enable/disable according to SB_IOSTATS flag */
+static inline int sb_iostats_configure(struct super_block *sb)
+{
+	bool want_iostats = (sb->s_flags & SB_IOSTATS);
+
+	if (want_iostats && !sb->s_iostats)
+		return sb_iostats_init(sb);
+	else if (!want_iostats && sb->s_iostats)
+		sb_iostats_destroy(sb);
 	return 0;
 }
 
@@ -109,6 +133,12 @@ static inline void sb_iostats_destroy(struct super_block *sb)
 {
 }
 
+static inline int sb_iostats_configure(struct super_block *sb)
+{
+	sb->s_flags &= ~SB_IOSTATS;
+	return 0;
+}
+
 static inline void sb_iostats_counter_inc(struct super_block *sb, int id)
 {
 }
-- 
2.25.1