Re: [PATCH v4 0/9] Generic per-sb io stats

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 7, 2022 at 2:14 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> On Sat, Mar 05, 2022 at 11:18:34PM -0500, Theodore Ts'o wrote:
> > On Sat, Mar 05, 2022 at 06:04:15PM +0200, Amir Goldstein wrote:
> > >
> > > Dave Chinner asked why the io stats should not be enabled for all
> > > filesystems.  That change seems too bold for me so instead, I included
> > > an extra patch to auto-enable per-sb io stats for blockdev filesystems.
> >
> > Perhaps something to consider is allowing users to be able to enable
> > or disable I/O stats on per mount basis?
> >
> > Consider if a potential future user of this feature has servers with
> > one or two 256-core AMD Epyc chip, and suppose that they have a
> > several thousand iSCSI mounted file systems containing various
> > software packages for use by Kubernetes jobs.  (Or even several
> > thousand mounted overlay file systems.....)
> >
> > The size of the percpu counter is going to be *big* on a large CPU
> > count machine, and the iostats structure has 5 of these per-cpu
> > counters, so if you have one for every single mounted file system,
> > even if the CPU slowdown isn't significant, the non-swappable kernel
> > memory overhead might be quite large.
>
> A percpu counter on a 256 core machine is ~1kB. Adding 5kB to the
> struct superblock isn't a bit deal for a machine of this size, even
> if you have thousands of superblocks - we're talking a few
> *megabytes* of extra memory in a machine that would typically have
> hundreds of GB of RAM. Seriously, the memory overhead of the per-cpu
> counters is noise compared to the memory footprint of, say, the
> stacks needing to be allocated for every background worker thread
> the filesystem needs.
>
> Yeah, I know, we have ~175 per-cpu stats counters per XFS superblock
> (we already cover the 4 counters Amir is proposing to add as generic
> SB counters), and we have half a dozen dedicated worker threads per
> mount. Yet systems still function just fine when there are thousands
> of XFS filesystems and thousands of CPUs.
>
> Seriously, a small handful of per-cpu counters that capture
> information for all superblocks is not a big deal. Small systems
> will have relatively litte overhead, large systems have the memory
> to handle it.
>
> > So maybe a VFS-level mount option, say, "iostats" and "noiostats", and
> > some kind of global option indicating whether the default should be
> > iostats being enabled or disabled?  Bonus points if iostats can be
> > enabled or disabled after the initial mount via remount operation.
>
> Can we please just avoid mount options for stuff like this? It'll
> just never get tested unless it defaults to on, and then almost
> no-one will ever turn it off because why would you bother tweaking
> something that has not noticable impact but can give useful insights
> the workload that is running?
>
> I don't care one way or another here because this is essentially
> duplicating something we've had in XFS for 20+ years. What I want to
> avoid is blowing out the test matrix even further. Adding optional
> features has a cost in terms of testing time, so if it's a feature
> that is only rarely going to be turned on then we shouldn't add it
> at all. If it's only rearely going to be turned off, OTOH, then we
> should just make it ubiquitous and available for everything so it's
> always tested.
>
> Hence, AFAICT, the only real option for yes/no support is the
> Kconfig option. If the kernel builder turns it on, it is on for
> everything, otherwise it is off for everything.
>

I agree with this sentiment and I also share Ted's concerns
that we may be overlooking some aspect, so my preference would
be that Miklos takes the sb_iostats infra patches through his tree
to enable iostats for fuse/overlayfs (I argued in the commit message
why I think they deserve a special treatment).

Regarding the last patch -
Ted, would you be more comfortable if it came with yet another
Kconfig (e.g. FS_IOSTATS_DEFAULT)? Or perhaps with a /proc/sys/fs/
fail safety off switch (like protected_symlinks)?
That gives more options to distros.

Thanks,
Amir.



[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux