Hi Haifeng, On 3/7/2024 11:41 PM, Haifeng Xu wrote: > In our production environment, after removing monitor groups, those unused > RMIDs get stuck in the limbo list forever because their llc_occupancy are > always larger than the threshold. But the unused RMIDs can be successfully > freed by turning up the threshold. > > In order to know how much the threshold should be, perf can be used to > acquire the llc_occupancy of RMIDs in each rdt domain. > > Instead of using perf tool to track llc_occupancy and filter the log > manually, it is more convenient for users to use tracepoint to do this > work. So add a new tracepoint that shows the llc_occupancy of busy RMIDs > when scanning the limbo list. > > Signed-off-by: Haifeng Xu <haifeng.xu@xxxxxxxxxx> > Suggested-by: Reinette Chatre <reinette.chatre@xxxxxxxxx> > Suggested-by: James Morse <james.morse@xxxxxxx> > Reviewed-by: James Morse <james.morse@xxxxxxx> > --- > Documentation/arch/x86/resctrl.rst | 8 ++++++++ > arch/x86/kernel/cpu/resctrl/monitor.c | 9 +++++++++ > arch/x86/kernel/cpu/resctrl/trace.h | 16 ++++++++++++++++ > 3 files changed, 33 insertions(+) > > diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst > index a6279df64a9d..dd3507dc765c 100644 > --- a/Documentation/arch/x86/resctrl.rst > +++ b/Documentation/arch/x86/resctrl.rst > @@ -478,6 +478,14 @@ if non-contiguous 1s value is supported. On a system with a 20-bit mask > each bit represents 5% of the capacity of the cache. You could partition > the cache into four equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000. > > +Tracepoint - mon_llc_occupancy_limbo > +------------------------------------ I think that the below paragraph would fit nicely as a new paragraph in the existing "max_threshold_occupancy - generic concepts" section. To support that just one change to text below ... > +This tracepoint gives you the precise occupancy values for a subset of RMID The mon_llc_occupancy_limbo tracepoint gives the precise occupancy in bytes for a subset of RMID ... > +that are not immediately available for allocation. This can't be relied on > +to produce output every second, it may be necessary to attempt to create an > +empty monitor group to force an update. Output may only be produced if creation > +of a control or monitor group fails. > + > Memory bandwidth Allocation and monitoring > ========================================== > > diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c > index c34a35ec0f03..60b6a29a9e29 100644 > --- a/arch/x86/kernel/cpu/resctrl/monitor.c > +++ b/arch/x86/kernel/cpu/resctrl/monitor.c > @@ -24,6 +24,7 @@ > #include <asm/resctrl.h> > > #include "internal.h" > +#include "trace.h" > > /** > * struct rmid_entry - dirty tracking for all RMID. > @@ -354,6 +355,14 @@ void __check_limbo(struct rdt_domain *d, bool force_free) > rmid_dirty = true; > } else { > rmid_dirty = (val >= resctrl_rmid_realloc_threshold); > + > + /* x86's CLOSID and RMID are independent numbers, so the entry's > + * closid is a invalid CLOSID. But on arm64, the RMID value isn't > + * a unique number for each CLOSID. It's necessary to track both > + * CLOSID and RMID because there may be dependencies between each > + * other on some architectures. > + */ Please watch for proper formatting of multi-line comment and consistent capitalization. I also think comment can be more accurate, for example: /* * x86's CLOSID and RMID are independent numbers, so the entry's * CLOSID is an empty CLOSID (X86_RESCTRL_EMPTY_CLOSID). On Arm the * RMID (PMG) extends the CLOSID (PARTID) space with bits that aren't used * to select the configuration. It is thus necessary to track both * CLOSID and RMID because there may be dependencies between them * on some architectures. */ > + trace_mon_llc_occupancy_limbo(entry->closid, entry->rmid, d->id, val); > } > > if (force_free || !rmid_dirty) { > diff --git a/arch/x86/kernel/cpu/resctrl/trace.h b/arch/x86/kernel/cpu/resctrl/trace.h > index ed5c66b8ab0b..b310b4985b94 100644 > --- a/arch/x86/kernel/cpu/resctrl/trace.h > +++ b/arch/x86/kernel/cpu/resctrl/trace.h > @@ -35,6 +35,22 @@ TRACE_EVENT(pseudo_lock_l3, > TP_printk("hits=%llu miss=%llu", > __entry->l3_hits, __entry->l3_miss)); > > +TRACE_EVENT(mon_llc_occupancy_limbo, > + TP_PROTO(u32 ctrl_hw_id, u32 mon_hw_id, int domain_id, u64 llc_occupancy_bytes), > + TP_ARGS(ctrl_hw_id, mon_hw_id, domain_id, llc_occupancy_bytes), > + TP_STRUCT__entry(__field(u32, ctrl_hw_id) > + __field(u32, mon_hw_id) > + __field(int, domain_id) > + __field(u64, llc_occupancy_bytes)), > + TP_fast_assign(__entry->ctrl_hw_id = ctrl_hw_id; > + __entry->mon_hw_id = mon_hw_id; > + __entry->domain_id = domain_id; > + __entry->llc_occupancy_bytes = llc_occupancy_bytes;), > + TP_printk("ctrl_hw_id=%u mon_hw_id=%u domain_d=%d llc_occupancy_bytes=%llu", domain_d -> domain_id > + __entry->ctrl_hw_id, __entry->mon_hw_id, __entry->domain_id, > + __entry->llc_occupancy_bytes) > + ); > + > #endif /* _TRACE_RESCTRL_H */ > > #undef TRACE_INCLUDE_PATH Reinette