Re: [PATCH] powercap/intel_rapl: refine RAPL error handling to respect initial CPU matching

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 30, 2019 at 3:37 PM Harry Pan <harry.pan@xxxxxxxxx> wrote:
>
> RAPL MMIO support depends on RAPL common driver, in case a new generation
> of CPU is booting but not in the RAPL support list, the processor_thermal
> driver invokes CPU hotplug API to enforce RAPL common driver adding new
> RAPL domain which would cause kernel crash by null pointer dereference
> because the internal RAPL domain resource mapping is not initialized after
> the common init.
>
> Add error handling to detect non initialized RAPL domain resource mapping
> and return error code to the caller; such that, it avoids early crash for
> new CPU and leave error messages through processor_thermal driver.
>
> Before:
> [    4.188566] BUG: kernel NULL pointer dereference, address: 0000000000000020
> ...snip...
> [    4.189555] RIP: 0010:rapl_add_package+0x223/0x574
> [    4.189555] Code: b5 a0 31 c0 49 8b 4d 78 48 01 d9 48 8b 0c c1 49 89 4c c6 10 48 ff c0 48 83 f8 05 75 e7 49 83 ff 03 75 15 48 8b 05 09 bc 18 01 <8b> 70 20 41 89 b6 0c 05 00 00 85 f6 75 1a 49 81 c6 18 9
> [    4.189555] RSP: 0000:ffffb3adc00b3d90 EFLAGS: 00010246
> [    4.189555] RAX: 0000000000000000 RBX: 0000000000000098 RCX: 0000000000000000
> [    4.267161] usb 1-1: New USB device found, idVendor=2109, idProduct=2812, bcdDevice= b.e0
> [    4.189555] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff9340caafd000
> [    4.189555] RBP: ffffb3adc00b3df8 R08: ffffffffa0246e28 R09: ffff9340caafc000
> [    4.189555] R10: 000000000000024a R11: ffffffff9ff1f6f2 R12: 00000000ffffffed
> [    4.189555] R13: ffff9340caa94800 R14: ffff9340caafc518 R15: 0000000000000003
> [    4.189555] FS:  0000000000000000(0000) GS:ffff9340ce200000(0000) knlGS:0000000000000000
> [    4.189555] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    4.189555] CR2: 0000000000000020 CR3: 0000000302c14001 CR4: 00000000003606f0
> [    4.189555] Call Trace:
> [    4.189555]  ? __switch_to_asm+0x40/0x70
> [    4.189555]  rapl_mmio_cpu_online+0x47/0x64
> [    4.189555]  ? rapl_mmio_write_raw+0x33/0x33
> [    4.281059] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> [    4.189555]  cpuhp_invoke_callback+0x29f/0x66f
> [    4.189555]  ? __schedule+0x46d/0x6a0
> [    4.189555]  cpuhp_thread_fun+0xb9/0x11c
> [    4.189555]  smpboot_thread_fn+0x17d/0x22f
> [    4.297006] usb 1-1: Product: USB2.0 Hub
> [    4.189555]  ? cpu_report_death+0x43/0x43
> [    4.189555]  kthread+0x137/0x13f
> [    4.189555]  ? cpu_report_death+0x43/0x43
> [    4.189555]  ? kthread_blkcg+0x2e/0x2e
> [    4.312951] usb 1-1: Manufacturer: VIA Labs, Inc.
> [    4.189555]  ret_from_fork+0x1f/0x40
> [    4.189555] Modules linked in:
> [    4.189555] CR2: 0000000000000020
> [    4.189555] ---[ end trace 01bb812aabc791f4 ]---
>
> After:
> [    0.787125] intel_rapl_common: driver does not support CPU family 6 model 166
> ...snip...
> [    4.245273] proc_thermal 0000:00:04.0: failed to add RAPL MMIO interface
>
> Note:
> This example above is on a v5.4 branch without below two CML commits yet:
> commit f84fdcbc8ec0 ("powercap/intel_rapl: add support for Cometlake desktop")
> commit cae478114fbe ("powercap/intel_rapl: add support for CometLake Mobile")
>
> Fixes: 555c45fe0d04 ("int340X/processor_thermal_device: add support for MMIO RAPL")
>
> Cc: <stable@xxxxxxxxxxxxxxx> # v5.3+
> Signed-off-by: Harry Pan <harry.pan@xxxxxxxxx>

Applied as a fix for 5.5-rc with rewritten subject and changelog (new
subject: "powercap: intel_rapl: add NULL pointer check to
rapl_mmio_cpu_online()").

> ---
>
>  drivers/powercap/intel_rapl_common.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c
> index 318d023a6a11..aa0a8de413b1 100644
> --- a/drivers/powercap/intel_rapl_common.c
> +++ b/drivers/powercap/intel_rapl_common.c
> @@ -1294,6 +1294,9 @@ struct rapl_package *rapl_add_package(int cpu, struct rapl_if_priv *priv)
>         struct cpuinfo_x86 *c = &cpu_data(cpu);
>         int ret;
>
> +       if (!rapl_defaults)
> +               return ERR_PTR(-ENODEV);
> +
>         rp = kzalloc(sizeof(struct rapl_package), GFP_KERNEL);
>         if (!rp)
>                 return ERR_PTR(-ENOMEM);
> --
> 2.24.1
>



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux