Re: [PATCH v4 1/5] x86: fix list corruption on CPU hotplug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2014-04-14 at 17:11 +0200, Igor Mammedov wrote:
> currently if AP wake up is failed, master CPU marks AP as not present
> in do_boot_cpu() by calling set_cpu_present(cpu, false).
> That leads to following list corruption on the next physical CPU
> hotplug:
> 
> [  418.107336] WARNING: CPU: 1 PID: 45 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
> [  418.115268] list_add corruption. prev->next should be next (ffff88003dc57600), but was ffff88003e20c3a0. (prev=ffff88003e20c3a0).
> [  418.123693] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT ipt_REJECT cfg80211 xt_conntrack rfkill ee
> [  418.138979] CPU: 1 PID: 45 Comm: kworker/u10:1 Not tainted 3.14.0-rc6+ #387
> [  418.149989] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
> [  418.165750] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> [  418.166433]  0000000000000021 ffff880038ca7988 ffffffff8159b22d 0000000000000021
> [  418.176460]  ffff880038ca79d8 ffff880038ca79c8 ffffffff8106942c ffff880038ca79e8
> [  418.177453]  ffff88003e20c3a0 ffff88003dc57600 ffff88003e20c3a0 00000000ffffffea
> [  418.178445] Call Trace:
> [  418.185811]  [<ffffffff8159b22d>] dump_stack+0x49/0x5c
> [  418.186440]  [<ffffffff8106942c>] warn_slowpath_common+0x8c/0xc0
> [  418.187192]  [<ffffffff81069516>] warn_slowpath_fmt+0x46/0x50
> [  418.191231]  [<ffffffff8136ef51>] ? acpi_ns_get_node+0xb7/0xc7
> [  418.193889]  [<ffffffff812f796e>] __list_add+0xbe/0xd0
> [  418.196649]  [<ffffffff812e2aa9>] kobject_add_internal+0x79/0x200
> [  418.208610]  [<ffffffff812e2e18>] kobject_add_varg+0x38/0x60
> [  418.213831]  [<ffffffff812e2ef4>] kobject_add+0x44/0x70
> [  418.229961]  [<ffffffff813e2c60>] device_add+0xd0/0x550
> [  418.234991]  [<ffffffff813f0e95>] ? pm_runtime_init+0xe5/0xf0
> [  418.250226]  [<ffffffff813e32be>] device_register+0x1e/0x30
> [  418.255296]  [<ffffffff813e82a3>] register_cpu+0xe3/0x130
> [  418.266539]  [<ffffffff81592be5>] arch_register_cpu+0x65/0x150
> [  418.285845]  [<ffffffff81355c0d>] acpi_processor_hotadd_init+0x5a/0x9b
> ...
> Which is caused by the fact that generic_processor_info() allocates
> logical CPU id by calling:
> 
>  cpu = cpumask_next_zero(-1, cpu_present_mask);
> 
> which returns id of previously failed to wake up CPU, since its bit
> is cleared by do_boot_cpu() and as result register_cpu() tries to
> register another CPU with the same id as already present but failed
> to be onlined CPU.
> 
> Taking in account that AP will not do anything if master CPU failed to
> wake it up, there is no reason to mark that AP as not present and
> break next cpu hotplug attempts. As a side effect of not marking AP
> as not present, user would be allowed to online it again later.
> 
> Signed-off-by: Igor Mammedov <imammedo@xxxxxxxxxx>

Hi Igor,

Sorry for long delay...  Can you please combine patch 1/5 and 2/5?  When
a CPU is marked as present, its APIC ID must be valid.  So, it does not
make sense to separate patch 1/5 and 2/5.  With that change:

Acked-by: Toshi Kani <toshi.kani@xxxxxx>

Thanks,
-Toshi



--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux