On Mon, 2014-04-14 at 17:11 +0200, Igor Mammedov wrote: > currently if AP wake up is failed, master CPU marks AP as not present > in do_boot_cpu() by calling set_cpu_present(cpu, false). > That leads to following list corruption on the next physical CPU > hotplug: > > [ 418.107336] WARNING: CPU: 1 PID: 45 at lib/list_debug.c:33 __list_add+0xbe/0xd0() > [ 418.115268] list_add corruption. prev->next should be next (ffff88003dc57600), but was ffff88003e20c3a0. (prev=ffff88003e20c3a0). > [ 418.123693] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT ipt_REJECT cfg80211 xt_conntrack rfkill ee > [ 418.138979] CPU: 1 PID: 45 Comm: kworker/u10:1 Not tainted 3.14.0-rc6+ #387 > [ 418.149989] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 > [ 418.165750] Workqueue: kacpi_hotplug acpi_hotplug_work_fn > [ 418.166433] 0000000000000021 ffff880038ca7988 ffffffff8159b22d 0000000000000021 > [ 418.176460] ffff880038ca79d8 ffff880038ca79c8 ffffffff8106942c ffff880038ca79e8 > [ 418.177453] ffff88003e20c3a0 ffff88003dc57600 ffff88003e20c3a0 00000000ffffffea > [ 418.178445] Call Trace: > [ 418.185811] [<ffffffff8159b22d>] dump_stack+0x49/0x5c > [ 418.186440] [<ffffffff8106942c>] warn_slowpath_common+0x8c/0xc0 > [ 418.187192] [<ffffffff81069516>] warn_slowpath_fmt+0x46/0x50 > [ 418.191231] [<ffffffff8136ef51>] ? acpi_ns_get_node+0xb7/0xc7 > [ 418.193889] [<ffffffff812f796e>] __list_add+0xbe/0xd0 > [ 418.196649] [<ffffffff812e2aa9>] kobject_add_internal+0x79/0x200 > [ 418.208610] [<ffffffff812e2e18>] kobject_add_varg+0x38/0x60 > [ 418.213831] [<ffffffff812e2ef4>] kobject_add+0x44/0x70 > [ 418.229961] [<ffffffff813e2c60>] device_add+0xd0/0x550 > [ 418.234991] [<ffffffff813f0e95>] ? pm_runtime_init+0xe5/0xf0 > [ 418.250226] [<ffffffff813e32be>] device_register+0x1e/0x30 > [ 418.255296] [<ffffffff813e82a3>] register_cpu+0xe3/0x130 > [ 418.266539] [<ffffffff81592be5>] arch_register_cpu+0x65/0x150 > [ 418.285845] [<ffffffff81355c0d>] acpi_processor_hotadd_init+0x5a/0x9b > ... > Which is caused by the fact that generic_processor_info() allocates > logical CPU id by calling: > > cpu = cpumask_next_zero(-1, cpu_present_mask); > > which returns id of previously failed to wake up CPU, since its bit > is cleared by do_boot_cpu() and as result register_cpu() tries to > register another CPU with the same id as already present but failed > to be onlined CPU. > > Taking in account that AP will not do anything if master CPU failed to > wake it up, there is no reason to mark that AP as not present and > break next cpu hotplug attempts. As a side effect of not marking AP > as not present, user would be allowed to online it again later. > > Signed-off-by: Igor Mammedov <imammedo@xxxxxxxxxx> Hi Igor, Sorry for long delay... Can you please combine patch 1/5 and 2/5? When a CPU is marked as present, its APIC ID must be valid. So, it does not make sense to separate patch 1/5 and 2/5. With that change: Acked-by: Toshi Kani <toshi.kani@xxxxxx> Thanks, -Toshi -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html