Re: [PATCH] hwmon: coretemp: fix oops on cpu unplug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2012-04-30 at 09:18 -0400, Kirill A. Shutemov wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>
> 
> coretemp tries to access core_data array beyond bounds on cpu unplug if
> core id of the cpu if more than NUM_REAL_CORES-1.
> 
> BUG: unable to handle kernel NULL pointer dereference at 000000000000013c
> IP: [<ffffffffa00159af>] coretemp_cpu_callback+0x93/0x1ba [coretemp]
> PGD 673e5a067 PUD 66e9b3067 PMD 0
> Oops: 0000 [#1] SMP
> CPU 79
> Modules linked in: sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf bnep bluetooth rfkill ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter nf_conntrack_ipv4 nf_defrag_ipv4 ip6_tables xt_state nf_conntrack coretemp crc32c_intel asix tpm_tis pcspkr usbnet iTCO_wdt i2c_i801 microcode mii joydev tpm i2c_core iTCO_vendor_support tpm_bios i7core_edac igb ioatdma edac_core dca megaraid_sas [last unloaded: oprofile]
> 
> Pid: 3315, comm: set-cpus Tainted: G        W    3.4.0-rc5+ #2 QCI QSSC-S4R/QSSC-S4R
> RIP: 0010:[<ffffffffa00159af>]  [<ffffffffa00159af>] coretemp_cpu_callback+0x93/0x1ba [coretemp]
> RSP: 0018:ffff880472fb3d48  EFLAGS: 00010246
> RAX: 0000000000000124 RBX: 0000000000000034 RCX: 00000000ffffffff
> RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
> RBP: ffff880472fb3d88 R08: ffff88077fcd36c0 R09: 0000000000000001
> R10: ffffffff8184bc48 R11: 0000000000000000 R12: ffff880273095800
> R13: 0000000000000013 R14: ffff8802730a1810 R15: 0000000000000000
> FS:  00007f694a20f720(0000) GS:ffff88077fcc0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 000000000000013c CR3: 000000067209b000 CR4: 00000000000007e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process set-cpus (pid: 3315, threadinfo ffff880472fb2000, task ffff880471fa0000)
> Stack:
>  ffff880277b4c308 0000000000000003 ffff880472fb3d88 0000000000000005
>  0000000000000034 00000000ffffffd1 ffffffff81cadc70 ffff880472fb3e14
>  ffff880472fb3dc8 ffffffff8161f48d ffff880471fa0000 0000000000000034
> Call Trace:
>  [<ffffffff8161f48d>] notifier_call_chain+0x4d/0x70
>  [<ffffffff8107f1be>] __raw_notifier_call_chain+0xe/0x10
>  [<ffffffff81059d30>] __cpu_notify+0x20/0x40
>  [<ffffffff815fa251>] _cpu_down+0x81/0x270
>  [<ffffffff815fa477>] cpu_down+0x37/0x50
>  [<ffffffff815fd6a3>] store_online+0x63/0xc0
>  [<ffffffff813c7078>] dev_attr_store+0x18/0x30
>  [<ffffffff811f02cf>] sysfs_write_file+0xef/0x170
>  [<ffffffff81180443>] vfs_write+0xb3/0x180
>  [<ffffffff8118076a>] sys_write+0x4a/0x90
>  [<ffffffff816236a9>] system_call_fastpath+0x16/0x1b
> Code: 48 c7 c7 94 60 01 a0 44 0f b7 ac 10 ac 00 00 00 31 c0 e8 41 b7 5f e1 41 83 c5 02 49 63 c5 49 8b 44 c4 10 48 85 c0 74 56 45 31 ff <39> 58 18 75 4e eb 1f 49 63 d7 4c 89 f7 48 89 45 c8 48 6b d2 28
> RIP  [<ffffffffa00159af>] coretemp_cpu_callback+0x93/0x1ba [coretemp]
>  RSP <ffff880472fb3d48>
> CR2: 000000000000013c
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> ---
>  drivers/hwmon/coretemp.c |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
> index 0d3141f..54a70fe 100644
> --- a/drivers/hwmon/coretemp.c
> +++ b/drivers/hwmon/coretemp.c
> @@ -709,6 +709,10 @@ static void __cpuinit put_core_offline(unsigned int cpu)
>  
>  	indx = TO_ATTR_NO(cpu);
>  
> +	/* The core id is too big, just return */
> +	if (indx > MAX_CORE_DATA - 1)
> +		return;
> +
>  	if (pdata->core_data[indx] && pdata->core_data[indx]->cpu == cpu)
>  		coretemp_remove_core(pdata, &pdev->dev, indx);
>  
Hi,

good catch. Couple of problems, though.

First, what number of cores are we talking about ? We should probably
increase NUM_REAL_CORES as well. Long term, we should get rid of the
dependency to prevent that problem from happening again, but that is a
different issue.

Second, we'll need the same code in get_core_online(). Otherwise the
platform device can be created for the new core (if the core is
re-enabled) but will never be deleted.

Thanks,
Guenter



_______________________________________________
lm-sensors mailing list
lm-sensors@xxxxxxxxxxxxxx
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors


[Index of Archives]     [Linux Kernel]     [Linux Hardware Monitoring]     [Linux USB Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]

  Powered by Linux