On Mon, Mar 7, 2022 at 2:14 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Sat, Mar 05, 2022 at 11:18:34PM -0500, Theodore Ts'o wrote: > > On Sat, Mar 05, 2022 at 06:04:15PM +0200, Amir Goldstein wrote: > > > > > > Dave Chinner asked why the io stats should not be enabled for all > > > filesystems. That change seems too bold for me so instead, I included > > > an extra patch to auto-enable per-sb io stats for blockdev filesystems. > > > > Perhaps something to consider is allowing users to be able to enable > > or disable I/O stats on per mount basis? > > > > Consider if a potential future user of this feature has servers with > > one or two 256-core AMD Epyc chip, and suppose that they have a > > several thousand iSCSI mounted file systems containing various > > software packages for use by Kubernetes jobs. (Or even several > > thousand mounted overlay file systems.....) > > > > The size of the percpu counter is going to be *big* on a large CPU > > count machine, and the iostats structure has 5 of these per-cpu > > counters, so if you have one for every single mounted file system, > > even if the CPU slowdown isn't significant, the non-swappable kernel > > memory overhead might be quite large. > > A percpu counter on a 256 core machine is ~1kB. Adding 5kB to the > struct superblock isn't a bit deal for a machine of this size, even > if you have thousands of superblocks - we're talking a few > *megabytes* of extra memory in a machine that would typically have > hundreds of GB of RAM. Seriously, the memory overhead of the per-cpu > counters is noise compared to the memory footprint of, say, the > stacks needing to be allocated for every background worker thread > the filesystem needs. > > Yeah, I know, we have ~175 per-cpu stats counters per XFS superblock > (we already cover the 4 counters Amir is proposing to add as generic > SB counters), and we have half a dozen dedicated worker threads per > mount. Yet systems still function just fine when there are thousands > of XFS filesystems and thousands of CPUs. > > Seriously, a small handful of per-cpu counters that capture > information for all superblocks is not a big deal. Small systems > will have relatively litte overhead, large systems have the memory > to handle it. > > > So maybe a VFS-level mount option, say, "iostats" and "noiostats", and > > some kind of global option indicating whether the default should be > > iostats being enabled or disabled? Bonus points if iostats can be > > enabled or disabled after the initial mount via remount operation. > > Can we please just avoid mount options for stuff like this? It'll > just never get tested unless it defaults to on, and then almost > no-one will ever turn it off because why would you bother tweaking > something that has not noticable impact but can give useful insights > the workload that is running? > > I don't care one way or another here because this is essentially > duplicating something we've had in XFS for 20+ years. What I want to > avoid is blowing out the test matrix even further. Adding optional > features has a cost in terms of testing time, so if it's a feature > that is only rarely going to be turned on then we shouldn't add it > at all. If it's only rearely going to be turned off, OTOH, then we > should just make it ubiquitous and available for everything so it's > always tested. > > Hence, AFAICT, the only real option for yes/no support is the > Kconfig option. If the kernel builder turns it on, it is on for > everything, otherwise it is off for everything. > I agree with this sentiment and I also share Ted's concerns that we may be overlooking some aspect, so my preference would be that Miklos takes the sb_iostats infra patches through his tree to enable iostats for fuse/overlayfs (I argued in the commit message why I think they deserve a special treatment). Regarding the last patch - Ted, would you be more comfortable if it came with yet another Kconfig (e.g. FS_IOSTATS_DEFAULT)? Or perhaps with a /proc/sys/fs/ fail safety off switch (like protected_symlinks)? That gives more options to distros. Thanks, Amir.