Hi Tony, On 10/29/24 10:28 AM, Tony Luck wrote: > Computing memory bandwidth for all enabled events resulted in > identical code blocks for total and local bandwidth in mbm_update(). > > Refactor with a helper function to eliminate code duplication. > > No functional change. > > Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx> > --- > arch/x86/kernel/cpu/resctrl/monitor.c | 69 ++++++++++----------------- > 1 file changed, 24 insertions(+), 45 deletions(-) > > diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c > index 3ef339e405c2..1b6cb3bbc008 100644 > --- a/arch/x86/kernel/cpu/resctrl/monitor.c > +++ b/arch/x86/kernel/cpu/resctrl/monitor.c > @@ -829,62 +829,41 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm) > resctrl_arch_update_one(r_mba, dom_mba, closid, CDP_NONE, new_msr_val); > } > > -static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d, > - u32 closid, u32 rmid) > +static void mbm_update_one_event(struct rdt_resource *r, struct rdt_mon_domain *d, > + u32 closid, u32 rmid, enum resctrl_event_id evtid) > { > struct rmid_read rr = {0}; > > rr.r = r; > rr.d = d; > + rr.evtid = evtid; > + rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid); > + if (IS_ERR(rr.arch_mon_ctx)) { > + pr_warn_ratelimited("Failed to allocate monitor context: %ld", > + PTR_ERR(rr.arch_mon_ctx)); > + return; > + } > + > + __mon_event_count(closid, rmid, &rr); > + > + if (is_mba_sc(NULL)) > + mbm_bw_count(closid, rmid, &rr); > + As I am staring at this more there seems to be an existing issue here ... note how __mon_event_count()'s return value is not checked before mbm_bw_count() is called. This means that mbm_bw_count() may run with rr.val of 0 that results in wraparound inside it resulting in some unexpected bandwidth numbers. Since a counter read can fail with a "Unavailable"/"Error" from hardware it is not deterministic how frequently this issue can be encountered. Skipping mbm_bw_count() if rr.val is 0 is one option ... that would keep the bandwidth measurement static at whatever was the last successful read and thus not cause dramatic changes by the software controller ... setting bandwidth to 0 if rr.val is 0 is another option to reflect that bandwidth data is unavailable, but then the software controller should perhaps get signal to not make adjustments? I expect there are better options? What do you think? Reinette