On Sat, Mar 05, 2022 at 11:18:34PM -0500, Theodore Ts'o wrote: > On Sat, Mar 05, 2022 at 06:04:15PM +0200, Amir Goldstein wrote: > > > > Dave Chinner asked why the io stats should not be enabled for all > > filesystems. That change seems too bold for me so instead, I included > > an extra patch to auto-enable per-sb io stats for blockdev filesystems. > > Perhaps something to consider is allowing users to be able to enable > or disable I/O stats on per mount basis? > > Consider if a potential future user of this feature has servers with > one or two 256-core AMD Epyc chip, and suppose that they have a > several thousand iSCSI mounted file systems containing various > software packages for use by Kubernetes jobs. (Or even several > thousand mounted overlay file systems.....) > > The size of the percpu counter is going to be *big* on a large CPU > count machine, and the iostats structure has 5 of these per-cpu > counters, so if you have one for every single mounted file system, > even if the CPU slowdown isn't significant, the non-swappable kernel > memory overhead might be quite large. A percpu counter on a 256 core machine is ~1kB. Adding 5kB to the struct superblock isn't a bit deal for a machine of this size, even if you have thousands of superblocks - we're talking a few *megabytes* of extra memory in a machine that would typically have hundreds of GB of RAM. Seriously, the memory overhead of the per-cpu counters is noise compared to the memory footprint of, say, the stacks needing to be allocated for every background worker thread the filesystem needs. Yeah, I know, we have ~175 per-cpu stats counters per XFS superblock (we already cover the 4 counters Amir is proposing to add as generic SB counters), and we have half a dozen dedicated worker threads per mount. Yet systems still function just fine when there are thousands of XFS filesystems and thousands of CPUs. Seriously, a small handful of per-cpu counters that capture information for all superblocks is not a big deal. Small systems will have relatively litte overhead, large systems have the memory to handle it. > So maybe a VFS-level mount option, say, "iostats" and "noiostats", and > some kind of global option indicating whether the default should be > iostats being enabled or disabled? Bonus points if iostats can be > enabled or disabled after the initial mount via remount operation. Can we please just avoid mount options for stuff like this? It'll just never get tested unless it defaults to on, and then almost no-one will ever turn it off because why would you bother tweaking something that has not noticable impact but can give useful insights the workload that is running? I don't care one way or another here because this is essentially duplicating something we've had in XFS for 20+ years. What I want to avoid is blowing out the test matrix even further. Adding optional features has a cost in terms of testing time, so if it's a feature that is only rarely going to be turned on then we shouldn't add it at all. If it's only rearely going to be turned off, OTOH, then we should just make it ubiquitous and available for everything so it's always tested. Hence, AFAICT, the only real option for yes/no support is the Kconfig option. If the kernel builder turns it on, it is on for everything, otherwise it is off for everything. > I could imagine some people only being interested to enable iostats on > certain file systems, or certain classes of block devices --- so they > might want it enabled on some ext4 file systems which are attached to > physical devices, but not on the N thousand iSCSI or nbd mounts that > are also happen to be using ext4. That seems ... fairly contrived. Block device IO stats are not turned on and off based on the block device type - they are generic. Network device stats are not turned on and off based on teh network device - they are generic. Why should per-filesystem IO stats be special and different to everything else? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx