On Tue, Sep 14, 2021 at 11:20:29AM +1000, Dave Chinner wrote: > On Mon, Sep 13, 2021 at 08:27:50AM +0200, Greg Kroah-Hartman wrote: > > On Mon, Sep 13, 2021 at 07:41:21AM +0200, Christoph Hellwig wrote: > > > Trivial conversion to the seq_file based sysfs attributes. > > > > > > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > > > --- > > > fs/xfs/xfs_stats.c | 24 +++++------- > > > fs/xfs/xfs_stats.h | 2 +- > > > fs/xfs/xfs_sysfs.c | 96 +++++++++++++++++++++++----------------------- > > > 3 files changed, 58 insertions(+), 64 deletions(-) > > > > > > diff --git a/fs/xfs/xfs_stats.c b/fs/xfs/xfs_stats.c > > > index 20e0534a772c9..71e7a84ba0403 100644 > > > --- a/fs/xfs/xfs_stats.c > > > +++ b/fs/xfs/xfs_stats.c > > > @@ -16,10 +16,9 @@ static int counter_val(struct xfsstats __percpu *stats, int idx) > > > return val; > > > } > > > > > > -int xfs_stats_format(struct xfsstats __percpu *stats, char *buf) > > > +void xfs_stats_format(struct xfsstats __percpu *stats, struct seq_file *sf) > > > { > > > int i, j; > > > - int len = 0; > > > uint64_t xs_xstrat_bytes = 0; > > > uint64_t xs_write_bytes = 0; > > > uint64_t xs_read_bytes = 0; > > > @@ -58,13 +57,12 @@ int xfs_stats_format(struct xfsstats __percpu *stats, char *buf) > > > /* Loop over all stats groups */ > > > > > > for (i = j = 0; i < ARRAY_SIZE(xstats); i++) { > > > - len += scnprintf(buf + len, PATH_MAX - len, "%s", > > > - xstats[i].desc); > > > + seq_printf(sf, "%s", xstats[i].desc); > > > + > > > /* inner loop does each group */ > > > for (; j < xstats[i].endpoint; j++) > > > - len += scnprintf(buf + len, PATH_MAX - len, " %u", > > > - counter_val(stats, j)); > > > - len += scnprintf(buf + len, PATH_MAX - len, "\n"); > > > + seq_printf(sf, " %u", counter_val(stats, j)); > > > + seq_printf(sf, "\n"); > > > } > > > /* extra precision counters */ > > > for_each_possible_cpu(i) { > > > @@ -74,18 +72,14 @@ int xfs_stats_format(struct xfsstats __percpu *stats, char *buf) > > > defer_relog += per_cpu_ptr(stats, i)->s.defer_relog; > > > } > > > > > > - len += scnprintf(buf + len, PATH_MAX-len, "xpc %Lu %Lu %Lu\n", > > > + seq_printf(sf, "xpc %Lu %Lu %Lu\n", > > > xs_xstrat_bytes, xs_write_bytes, xs_read_bytes); > > > - len += scnprintf(buf + len, PATH_MAX-len, "defer_relog %llu\n", > > > - defer_relog); > > > - len += scnprintf(buf + len, PATH_MAX-len, "debug %u\n", > > > + seq_printf(sf, "defer_relog %llu\n", defer_relog); > > > #if defined(DEBUG) > > > - 1); > > > + seq_printf(sf, "debug 1\n"); > > > #else > > > - 0); > > > + seq_printf(sf, "debug 0\n"); > > > #endif > > > - > > > - return len; > > > } > > > > That is a sysfs file? What happened to the "one value per file" rule > > here? > > > There is no "rule" that says syfs files must contain one value per > file; the documentation says that one value per file is the > "preferred" format. Documentation/filesystems/sysfs.rst: > > [...] > Attributes > ... > Attributes should be ASCII text files, preferably with only one value > per file. It is noted that it may not be efficient to contain only one > value per file, so it is socially acceptable to express an array of > values of the same type. > [...] > An array of values is one thing like "what is the power states for this device". A list of different key/value pairs is a totally different thing entirely. > We are exposing a large array of integer values here, so multiple > values per file are explicitly considered an acceptible format. Not really, that was not the goal of sysfs at all. > Further, as there are roughly 200 individual stats in this file and > calculating each stat requires per-cpu aggregation, the the cost of > calculating and reading each stat individually is prohibitive, not > just inefficient. Have you measured it? How often does the file get read and by what tools? We have learned from our past mistakes in /proc where we did this in the past and required keeping obsolete values and constantly tweaking userspace parsers. That is why we made sysfs one-value-per-file. If the file is not there, the value is not there, much easier to handle future changes. > So, yes, we might have multiple lines in the file that you can frown > about, but OTOH the file format has been exposed as a kernel ABI for > a couple of decades via /proc/fs/xfs/stat. proc had no such rules, but we have learned :) > Hence exposing it in > sysfs to provide a more fine-grained breakdown of the stats (per > mount instead of global) is a no-brainer. We don't have to rewrite > the parsing engines in multiple userspace monitoring programs to > extract this information from the kernel - they just create a new > instance and read a different file and it all just works. But then you run into the max size restriction on sysfs files (PAGE_SIZE) and things break down. Please don't do this. > Indeed, there's precedence for such /proc file formats in more > fine-grained sysfs files. e.g. /sys/bus/node/devices/node<n>/vmstat > and /sys/bus/node/devices/node<n>/meminfo retain the same format > (and hence userspace parsers) for the per-node stats as /proc/vmstat > and /proc/meminfo use for the global stats... And I have complained about those files in the past many times. And they are running into problems in places dealing with them too. > tl;dr: the file contains arrays of values, it's inefficient to read > values one at a time, it's a pre-existing ABI-constrainted file > format, there's precedence in core kernel statistics > implementations and the documented guidelines allow this sort of > usage in these cases. I would prefer not to do this, and I will not take core sysfs changes to make this any easier. Which is one big reason why I don't like just making sysfs use the seq file api, it would allow stuff like this to propagate to other places in the kernel. Maybe I should cut the file size of a sysfs file down to PAGE_SIZE/4 or less, that might be better :) thanks, greg k-h