Buggy BIOSes may not set a sane boot-time Energy Performance Bias (EPB). A result of this may be overheating or excess power usage. The kernel overrides any boot-time EPB "performance" bias to "normal" to avoid this. When used in data centers it is preferable keep the EPB at "performance" when performing a live-update of the host kernel via a kexec to the new kernel. This is due to boot-time being critical when performing the kexec as running guest VMs will perceieve this as latency or downtime. On Intel Xeon Ice Lake platforms it has been observed that a combination of EPB being set to "normal" alongside HWP (Intel Hardware P-states) being enabled/configured during or close to the kexec causes an increases the live-update/kexec downtime by 7 times compared to when the EPB is set to "performance". Introduce a command-line parameter, "intel_epb=preserve", to skip the "performance" -> "normal" override/workaround. This maintains prior functionality when no parameter is set, but adds in the ability to stay at performance for a speedy kexec if a user wishes. Signed-off-by: Jack Allister <jalliste@xxxxxxxxxx> Acked-by: Rafael J. Wysocki <rafael@xxxxxxxxxx> Cc: Paul Durrant <pdurrant@xxxxxxxxxx> Cc: Jue Wang <juew@xxxxxxxxxx> Cc: Usama Arif <usama.arif@xxxxxxxxxxxxx> --- .../admin-guide/kernel-parameters.txt | 9 ++++++++ arch/x86/kernel/cpu/intel_epb.c | 22 +++++++++++++++++-- 2 files changed, 29 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 65731b060e3f..d28f2fc41c0c 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2148,6 +2148,15 @@ 0 disables intel_idle and fall back on acpi_idle. 1 to 9 specify maximum depth of C-state. + intel_epb= [X86] + auto (default) + Work around buggy BIOSes to avoid excess power usage + by forcing the performance bias to "normal" at boot-time. + preserve + Do not override the existing performance bias setting. + Useful if a previous kernel or bootloader's setting is + more desirable than "normal". + intel_pstate= [X86] disable Do not enable intel_pstate as the default diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c index e4c3ba91321c..01d406177751 100644 --- a/arch/x86/kernel/cpu/intel_epb.c +++ b/arch/x86/kernel/cpu/intel_epb.c @@ -50,7 +50,8 @@ * the OS will do that anyway. That sometimes is problematic, as it may cause * the system battery to drain too fast, for example, so it is better to adjust * it on CPU bring-up and if the initial EPB value for a given CPU is 0, the - * kernel changes it to 6 ('normal'). + * kernel changes it to 6 ('normal'). However, if it is desirable to retain the + * original initial EPB value, intel_epb=preserve can be set to enforce it. */ static DEFINE_PER_CPU(u8, saved_epb); @@ -75,6 +76,8 @@ static u8 energ_perf_values[] = { [EPB_INDEX_POWERSAVE] = ENERGY_PERF_BIAS_POWERSAVE, }; +static bool intel_epb_no_override __read_mostly; + static int intel_epb_save(void) { u64 epb; @@ -106,7 +109,7 @@ static void intel_epb_restore(void) * ('normal'). */ val = epb & EPB_MASK; - if (val == ENERGY_PERF_BIAS_PERFORMANCE) { + if (!intel_epb_no_override && val == ENERGY_PERF_BIAS_PERFORMANCE) { val = energ_perf_values[EPB_INDEX_NORMAL]; pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n"); } @@ -213,6 +216,21 @@ static const struct x86_cpu_id intel_epb_normal[] = { {} }; +static __init int parse_intel_epb(char *str) +{ + if (!str) + return 0; + + /* "intel_epb=preserve" prevents PERFORMANCE->NORMAL on restore. */ + if (!strcmp(str, "preserve")) + intel_epb_no_override = true; + + /* "intel_epb=auto" not explicitly checked as default behaviour. */ + return 0; +} + +early_param("intel_epb", parse_intel_epb); + static __init int intel_epb_init(void) { const struct x86_cpu_id *id = x86_match_cpu(intel_epb_normal); -- 2.40.1