> I experimented with all-groups, per-domain counter aggregation files > prototype using this change as a starting point. > > I'm happy to report that the values reported looked fairly reasonable. > > Tested-by: Peter Newman <peternewman@xxxxxxxxxx> Thanks for the test report. > On an AMD EPYC 7B12 64-Core Processor, I saw a consistent 1.021-1.026 > second period. Is this enough error that you would want to divide by > the actual period instead of assuming a denominator of 1 exactly? > We're mainly concerned with the relative bandwidth of jobs, so this > error isn't much concern as long as it doesn't favor any group. I see pretty much the same delta_t on Intel Icelake. We could use jiffies to get a bit more precision (depending on HZ value). > The only thing I'd worry about is if the user is using setitimer() to > keep a consistent 1 second period for reading the bandwidth rate, the > window of the resctrl updates would drift away from the userspace > consumer over time. One other thing I did in my resctrl2 summary code was to patch the modification time of the summary file to when the kernel ran mbm_handle_overflow(). That would allow users to check the update time to stay in sync with kernel updates. -Tony