Re: [PATCH v5 2/2] x86/resctrl: Add tracepoint for llc_occupancy tracking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Haifeng,

On 3/7/2024 11:41 PM, Haifeng Xu wrote:
> In our production environment, after removing monitor groups, those unused
> RMIDs get stuck in the limbo list forever because their llc_occupancy are
> always larger than the threshold. But the unused RMIDs can be successfully
> freed by turning up the threshold.
> 
> In order to know how much the threshold should be, perf can be used to
> acquire the llc_occupancy of RMIDs in each rdt domain.
> 
> Instead of using perf tool to track llc_occupancy and filter the log
> manually, it is more convenient for users to use tracepoint to do this
> work. So add a new tracepoint that shows the llc_occupancy of busy RMIDs
> when scanning the limbo list.
> 
> Signed-off-by: Haifeng Xu <haifeng.xu@xxxxxxxxxx>
> Suggested-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
> Suggested-by: James Morse <james.morse@xxxxxxx>
> Reviewed-by: James Morse <james.morse@xxxxxxx>
> ---
>  Documentation/arch/x86/resctrl.rst    |  8 ++++++++
>  arch/x86/kernel/cpu/resctrl/monitor.c |  9 +++++++++
>  arch/x86/kernel/cpu/resctrl/trace.h   | 16 ++++++++++++++++
>  3 files changed, 33 insertions(+)
> 
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index a6279df64a9d..dd3507dc765c 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -478,6 +478,14 @@ if non-contiguous 1s value is supported. On a system with a 20-bit mask
>  each bit represents 5% of the capacity of the cache. You could partition
>  the cache into four equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
>  
> +Tracepoint - mon_llc_occupancy_limbo
> +------------------------------------

I think that the below paragraph would fit nicely as a new paragraph in the
existing "max_threshold_occupancy - generic concepts" section. To support that
just one change to text below ...

> +This tracepoint gives you the precise occupancy values for a subset of RMID

The mon_llc_occupancy_limbo tracepoint gives the precise occupancy in bytes
for a subset of RMID ...

> +that are not immediately available for allocation. This can't be relied on
> +to produce output every second, it may be necessary to attempt to create an
> +empty monitor group to force an update. Output may only be produced if creation
> +of a control or monitor group fails.
> +
>  Memory bandwidth Allocation and monitoring
>  ==========================================
>  
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index c34a35ec0f03..60b6a29a9e29 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -24,6 +24,7 @@
>  #include <asm/resctrl.h>
>  
>  #include "internal.h"
> +#include "trace.h"
>  
>  /**
>   * struct rmid_entry - dirty tracking for all RMID.
> @@ -354,6 +355,14 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
>  			rmid_dirty = true;
>  		} else {
>  			rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
> +
> +			/* x86's CLOSID and RMID are independent numbers, so the entry's
> +			 * closid is a invalid CLOSID. But on arm64, the RMID value isn't
> +			 * a unique number for each CLOSID. It's necessary to track both
> +			 * CLOSID and RMID because there may be dependencies between each
> +			 * other on some architectures.
> +			 */

Please watch for proper formatting of multi-line comment and consistent capitalization.
I also think comment can be more accurate, for example:

	/*
	 * x86's CLOSID and RMID are independent numbers, so the entry's
 	 * CLOSID is an empty CLOSID (X86_RESCTRL_EMPTY_CLOSID). On Arm the
	 * RMID (PMG) extends the CLOSID (PARTID) space with bits that aren't used
	 * to select the configuration. It is thus necessary to track both
	 * CLOSID and RMID because there may be dependencies between them
	 * on some architectures.
	 */

> +			trace_mon_llc_occupancy_limbo(entry->closid, entry->rmid, d->id, val);
>  		}
>  
>  		if (force_free || !rmid_dirty) {
> diff --git a/arch/x86/kernel/cpu/resctrl/trace.h b/arch/x86/kernel/cpu/resctrl/trace.h
> index ed5c66b8ab0b..b310b4985b94 100644
> --- a/arch/x86/kernel/cpu/resctrl/trace.h
> +++ b/arch/x86/kernel/cpu/resctrl/trace.h
> @@ -35,6 +35,22 @@ TRACE_EVENT(pseudo_lock_l3,
>  	    TP_printk("hits=%llu miss=%llu",
>  		      __entry->l3_hits, __entry->l3_miss));
>  
> +TRACE_EVENT(mon_llc_occupancy_limbo,
> +	    TP_PROTO(u32 ctrl_hw_id, u32 mon_hw_id, int domain_id, u64 llc_occupancy_bytes),
> +	    TP_ARGS(ctrl_hw_id, mon_hw_id, domain_id, llc_occupancy_bytes),
> +	    TP_STRUCT__entry(__field(u32, ctrl_hw_id)
> +			     __field(u32, mon_hw_id)
> +			     __field(int, domain_id)
> +			     __field(u64, llc_occupancy_bytes)),
> +	    TP_fast_assign(__entry->ctrl_hw_id = ctrl_hw_id;
> +			   __entry->mon_hw_id = mon_hw_id;
> +			   __entry->domain_id = domain_id;
> +			   __entry->llc_occupancy_bytes = llc_occupancy_bytes;),
> +	    TP_printk("ctrl_hw_id=%u mon_hw_id=%u domain_d=%d llc_occupancy_bytes=%llu",

domain_d -> domain_id

> +		      __entry->ctrl_hw_id, __entry->mon_hw_id, __entry->domain_id,
> +		      __entry->llc_occupancy_bytes)
> +	   );
> +
>  #endif /* _TRACE_RESCTRL_H */
>  
>  #undef TRACE_INCLUDE_PATH


Reinette




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux