On 2024/3/14 06:47, Reinette Chatre wrote: > Hi Haifeng, > > On 3/7/2024 11:41 PM, Haifeng Xu wrote: >> In our production environment, after removing monitor groups, those unused >> RMIDs get stuck in the limbo list forever because their llc_occupancy are >> always larger than the threshold. But the unused RMIDs can be successfully >> freed by turning up the threshold. >> >> In order to know how much the threshold should be, perf can be used to >> acquire the llc_occupancy of RMIDs in each rdt domain. >> >> Instead of using perf tool to track llc_occupancy and filter the log >> manually, it is more convenient for users to use tracepoint to do this >> work. So add a new tracepoint that shows the llc_occupancy of busy RMIDs >> when scanning the limbo list. >> >> Signed-off-by: Haifeng Xu <haifeng.xu@xxxxxxxxxx> >> Suggested-by: Reinette Chatre <reinette.chatre@xxxxxxxxx> >> Suggested-by: James Morse <james.morse@xxxxxxx> >> Reviewed-by: James Morse <james.morse@xxxxxxx> >> --- >> Documentation/arch/x86/resctrl.rst | 8 ++++++++ >> arch/x86/kernel/cpu/resctrl/monitor.c | 9 +++++++++ >> arch/x86/kernel/cpu/resctrl/trace.h | 16 ++++++++++++++++ >> 3 files changed, 33 insertions(+) >> >> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst >> index a6279df64a9d..dd3507dc765c 100644 >> --- a/Documentation/arch/x86/resctrl.rst >> +++ b/Documentation/arch/x86/resctrl.rst >> @@ -478,6 +478,14 @@ if non-contiguous 1s value is supported. On a system with a 20-bit mask >> each bit represents 5% of the capacity of the cache. You could partition >> the cache into four equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000. >> >> +Tracepoint - mon_llc_occupancy_limbo >> +------------------------------------ > > I think that the below paragraph would fit nicely as a new paragraph in the > existing "max_threshold_occupancy - generic concepts" section. To support that > just one change to text below ... > >> +This tracepoint gives you the precise occupancy values for a subset of RMID > > The mon_llc_occupancy_limbo tracepoint gives the precise occupancy in bytes > for a subset of RMID ... > OK, I'll move the descriptions to "max_threshold_occupancy - generic concepts" section. >> +that are not immediately available for allocation. This can't be relied on >> +to produce output every second, it may be necessary to attempt to create an >> +empty monitor group to force an update. Output may only be produced if creation >> +of a control or monitor group fails. >> + >> Memory bandwidth Allocation and monitoring >> ========================================== >> >> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c >> index c34a35ec0f03..60b6a29a9e29 100644 >> --- a/arch/x86/kernel/cpu/resctrl/monitor.c >> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c >> @@ -24,6 +24,7 @@ >> #include <asm/resctrl.h> >> >> #include "internal.h" >> +#include "trace.h" >> >> /** >> * struct rmid_entry - dirty tracking for all RMID. >> @@ -354,6 +355,14 @@ void __check_limbo(struct rdt_domain *d, bool force_free) >> rmid_dirty = true; >> } else { >> rmid_dirty = (val >= resctrl_rmid_realloc_threshold); >> + >> + /* x86's CLOSID and RMID are independent numbers, so the entry's >> + * closid is a invalid CLOSID. But on arm64, the RMID value isn't >> + * a unique number for each CLOSID. It's necessary to track both >> + * CLOSID and RMID because there may be dependencies between each >> + * other on some architectures. >> + */ > > Please watch for proper formatting of multi-line comment and consistent capitalization. > I also think comment can be more accurate, for example: > > /* > * x86's CLOSID and RMID are independent numbers, so the entry's > * CLOSID is an empty CLOSID (X86_RESCTRL_EMPTY_CLOSID). On Arm the > * RMID (PMG) extends the CLOSID (PARTID) space with bits that aren't used > * to select the configuration. It is thus necessary to track both > * CLOSID and RMID because there may be dependencies between them > * on some architectures. > */ > Thanks for your suggestions! >> + trace_mon_llc_occupancy_limbo(entry->closid, entry->rmid, d->id, val); >> } >> >> if (force_free || !rmid_dirty) { >> diff --git a/arch/x86/kernel/cpu/resctrl/trace.h b/arch/x86/kernel/cpu/resctrl/trace.h >> index ed5c66b8ab0b..b310b4985b94 100644 >> --- a/arch/x86/kernel/cpu/resctrl/trace.h >> +++ b/arch/x86/kernel/cpu/resctrl/trace.h >> @@ -35,6 +35,22 @@ TRACE_EVENT(pseudo_lock_l3, >> TP_printk("hits=%llu miss=%llu", >> __entry->l3_hits, __entry->l3_miss)); >> >> +TRACE_EVENT(mon_llc_occupancy_limbo, >> + TP_PROTO(u32 ctrl_hw_id, u32 mon_hw_id, int domain_id, u64 llc_occupancy_bytes), >> + TP_ARGS(ctrl_hw_id, mon_hw_id, domain_id, llc_occupancy_bytes), >> + TP_STRUCT__entry(__field(u32, ctrl_hw_id) >> + __field(u32, mon_hw_id) >> + __field(int, domain_id) >> + __field(u64, llc_occupancy_bytes)), >> + TP_fast_assign(__entry->ctrl_hw_id = ctrl_hw_id; >> + __entry->mon_hw_id = mon_hw_id; >> + __entry->domain_id = domain_id; >> + __entry->llc_occupancy_bytes = llc_occupancy_bytes;), >> + TP_printk("ctrl_hw_id=%u mon_hw_id=%u domain_d=%d llc_occupancy_bytes=%llu", > > domain_d -> domain_id > >> + __entry->ctrl_hw_id, __entry->mon_hw_id, __entry->domain_id, >> + __entry->llc_occupancy_bytes) >> + ); >> + >> #endif /* _TRACE_RESCTRL_H */ >> >> #undef TRACE_INCLUDE_PATH > > > Reinette Thanks!