On Thu, May 06, 2021 at 08:29:40AM +1000, Balbir Singh wrote: > On Wed, May 05, 2021 at 12:59:40PM +0200, Peter Zijlstra wrote: > > Hi, > > > > Due to: > > > > https://lkml.kernel.org/r/0000000000001d43ac05c0f5c6a0@xxxxxxxxxx > > > > and general principle, delayacct really shouldn't be using ktime (pvclock also > > really shouldn't be doing what it does, but that's another story). This lead me > > to looking at the SCHED_INFO, SCHEDSTATS, DELAYACCT (and PSI) accounting hell. > > > > The rest of the patches are an attempt at simplifying all that a little. All > > that crud is enabled by default for distros which is leading to a death by a > > thousand cuts. > > > > The last patch is an attempt at default disabling DELAYACCT, because I don't > > think anybody actually uses that much, but what do I know, there were no ill > > effects on my testbox. Perhaps we should mirror > > /proc/sys/kernel/sched_schedstats and provide a delayacct sysctl for runtime > > frobbing. > > > > There are tools like iotop that use delayacct to display information. Right, but how many actual people use that? Does that justify saddling the whole sodding world with the overhead? > When the > code was checked in, we did run SPEC* back in the day 2006 to find overheads, > nothing significant showed. Do we have any date on the overhead your seeing? I've not looked, but having it disabled saves that per-task allocation and that spinlock in delayacct_end() for iowait wakeups and a bunch of cache misses ofcourse. I doubt SPEC is a benchmark that tickles those paths much if at all. The thing is; we can't just keep growing more and more stats, that'll kill us quite dead.