RE: [PATCH v9 4/9] x86/resctrl: Compute memory bandwidth for all supported events

"Luck, Tony" <tony.luck@xxxxxxxxx> · Fri, 15 Nov 2024 16:59:47 +0000

> I experimented with all-groups, per-domain counter aggregation files
> prototype using this change as a starting point.
>
> I'm happy to report that the values reported looked fairly reasonable.
>
> Tested-by: Peter Newman <peternewman@xxxxxxxxxx>

Thanks for the test report.

> On an AMD EPYC 7B12 64-Core Processor, I saw a consistent 1.021-1.026
> second period. Is this enough error that you would want to divide by
> the actual period instead of assuming a denominator of 1 exactly?
> We're mainly concerned with the relative bandwidth of jobs, so this
> error isn't much concern as long as it doesn't favor any group.

I see pretty much the same delta_t on Intel Icelake. We could
use jiffies to get a bit more precision (depending on HZ value).

> The only thing I'd worry about is if the user is using setitimer() to
> keep a consistent 1 second period for reading the bandwidth rate, the
> window of the resctrl updates would drift away from the userspace
> consumer over time.

One other thing I did in my resctrl2 summary code was to patch
the modification time of the summary file to when the kernel ran
mbm_handle_overflow(). That would allow users to check the
update time to stay in sync with kernel updates.

-Tony