Re: [RFC PATCH 0/2] cpufreq_ext: Introduce cpufreq ext governor

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 27, 2024 at 06:13:40PM +0800, Yipeng Zou wrote:
> Hi everyone,
> 
> I am currently working on a patch for a CPU frequency governor based on
> BPF, which can use BPF to customize and implement various frequency
> scaling strategies.
> 
> If you have any feedback or suggestions, please do let me know.
> 
> Motivation
> ----------
> 
> 1. Customization
> 
> Existing cpufreq governors in the kernel are designed for general
> scenarios, which may not always be optimal for specific or specialized
> workloads.
> 
> The userspace governor allows direct control over cpufreq, but users
> often require guidance from the kernel to achieve the desired frequency.
> 
> Cpufreq_ext aims to address this by providing a customizable framework that
> can be tailored to the unique needs of different systems and applications.
> 
> While cpufreq governors can be implemented within a kernel module,
> maintaining a ko tailored for specific scenarios can be challenging.
> The complexity and overhead associated with kernel modules make it
> difficult to quickly adapt and deploy custom frequency scaling strategies.
> 
> Cpufreq_ext leverages BPF to offer a more lightweight and flexible approach
> to implementing customized strategies, allowing for easier maintenance and
> deployment.
> 
> 2. Integration with sched_ext:
> 
> sched_ext is a scheduler class whose behavior can be defined by a set of
> BPF programs - the BPF scheduler.
> 
> Look for more about sched_ext in [1]:
> 
> 	[1] https://www.kernel.org/doc/html/next/scheduler/sched-ext.html
> 
> The interaction between CPU frequency scaling and task scheduling is
> critical for performance.
> 
> cpufreq_ext can work with sched_ext to ensure that both scheduling
> decisions and frequency adjustments are made in a coordinated manner,
> optimizing system responsiveness and power consumption.

Hi Yipeng, I prototyped something really similar earlier this year and
the conclusion I came to was that a governor might not be the right
abstraction for struct_ops. One issue is that depending on the frequency
driver being used it may have it governor implmentation included (ex:
intel_pstate). For sched_ext there is already a kfunc
(scx_bpf_cpuperf_set) which is a calls into cpufreq_update_util and that
has been working well so far.

> Overview
> --------
> 
> The cpufreq ext is a BPF based cpufreq governor, we can customize
> cpufreq governor in BPF program.
> 
> CPUFreq ext works as common cpufreq governor with cpufreq policy.
> 
> 		   --------------------------
> 		  |        BPF governor      |
> 		   --------------------------
> 			       |
> 			       v
> 			  BPF Register
> 			       |
> 			       v
> 	    --------------------------------------
> 	   |             CPUFreq ext              |
> 	    --------------------------------------
> 	      ^                ^               ^
> 	      |                |               |
> 	   ---------       ---------       ---------
> 	  | policy0 | ... | policy1 | ... | policyn |
> 	   ---------       ---------       ---------
> 
> We can register serval function hooks to cpufreq ext by BPF Struct OPS.
> 
> The first patch define a dbs_governor, and it's works like other
> governor.
> 
> The second patch gives a sample how to use it, implement one
> typical cpufreq governor, switch to max cpufreq when VIP task
> is running on target cpu.
> 
> Detail
> ------
> 
> The cpufreq ext use bpf_struct_ops to register serval function hooks.
> 
> 	struct cpufreq_governor_ext_ops {
> 		...
> 	}
> 
> Cpufreq_governor_ext_ops defines all the functions that BPF programs can
> implement customly.
> 
> If you need to add a custom function, you only need to define it in this
> struct.
> 
> At the moment we have defined the basic functions.
> 
> 1. unsigned long (*get_next_freq)(struct cpufreq_policy *policy)
> 
> 	Make decision how to adjust cpufreq here.
> 	The return value represents the CPU frequency that will be
> 	updated.
> 
> 2. unsigned int (*get_sampling_rate)(struct cpufreq_policy *policy)
> 
> 	Make decision how to adjust sampling_rate here.
> 	The return value represents the governor samplint rate that
> 	will be updated.
> 

Why does the governor need a sampling rate? Could this be done with a
bpf timer instead?

> 3. unsigned int (*init)(void)
> 
> 	BPF governor init callback, return 0 means success.
> 
> 4. void (*exit)(void)
> 
> 	BPF governor exit callback.
> 
> 5. char name[CPUFREQ_EXT_NAME_LEN]
> 
> 	BPF governor name.
> 
I'm guessing it would be useful to have the governor dispatch on almost
all the governor methods. IIRC I had something like:

	int	(*start)(struct cpufreq_policy *policy);
	void	(*stop)(struct cpufreq_policy *policy);
	void	(*limits)(struct cpufreq_policy *policy);
	int	(*store_setspeed)(struct cpufreq_policy *policy,
				  unsigned int freq);

> The cpufreq_ext also add sysfs interface which refer to governor status.
> 
> 1. ext/stat attribute:
> 
> 	Access to current BPF governor status.
> 
> 	# cat /sys/devices/system/cpu/cpufreq/ext/stat
> 	Stat: CPUFREQ_EXT_INIT
> 	BPF governor: performance
> 
> There are number of constraints on the cpufreq_ext:
> 
> 1. Only one ext governor can be registered at a time.
> 
> 2. By default, it operates as a performance governor when no BPF
>    governor is registered.
> 
> 3. The cpufreq_ext governor must be selected before loading a BPF
>    governor; otherwise, the installation of the BPF governor will fail.
> 
> TODO
> ----
> 
> The current patch is a starting point, and future work will focus on
> expanding its capabilities.
> 
> I plan to leverage the BPF ecosystem to introduce innovative features,
> such as real-time adjustments and optimizations based on system-wide
> observations and analytics.
> 
> And I am looking forward to any insights, critiques, or suggestions you
> may have.
> 
> Yipeng Zou (2):
>   cpufreq_ext: Introduce cpufreq ext governor
>   cpufreq_ext: Add bpf sample
> 
>  drivers/cpufreq/Kconfig        |  23 ++
>  drivers/cpufreq/Makefile       |   1 +
>  drivers/cpufreq/cpufreq_ext.c  | 525 +++++++++++++++++++++++++++++++++
>  samples/bpf/.gitignore         |   1 +
>  samples/bpf/Makefile           |   8 +-
>  samples/bpf/cpufreq_ext.bpf.c  | 113 +++++++
>  samples/bpf/cpufreq_ext_user.c |  48 +++
>  7 files changed, 718 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/cpufreq/cpufreq_ext.c
>  create mode 100644 samples/bpf/cpufreq_ext.bpf.c
>  create mode 100644 samples/bpf/cpufreq_ext_user.c
> 
> -- 
> 2.34.1
> 




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux