[PATCH -rt 3.10.x] mce: don't try to wake thread before it exists.

Paul Gortmaker <paul.gortmaker@xxxxxxxxxxxxx> · Tue, 26 Aug 2014 18:10:53 -0400

If a broken machine with issues raises an MCE irq event real
early in the boot, it can try and wake the -rt specific handler
thread (mce_notify_helper) before it exists.  (It is created
through a device_initcall that happens later in the boot.)  When
this happens, we see the irq, which calls the wake with a null
pointer, which then panics the machine at boot.

The race between the irq event and thread init is as follows:

mce_notify_irq();
  --> mce_notify_work();
        --> wake_up_process(mce_notify_helper);

device_initcall_sync(mcheck_init_device);
  --> mce_notify_work_init();
        --> mce_notify_helper = kthread_run(mce_notify_helper_thread, ...);

So, clearly if the IRQ event happens before the device_initcall,
the mce_notify_helper pointer (at global file scope and hence BSS)
will still be NULL, resulting in the following panic at boot:

  CPU: Physical Processor ID: 0
  CPU: Processor Core ID: 0
  ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
  ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)
  mce: CPU supports 22 MCE banks
  CPU0: Thermal monitoring enabled (TM1)
  Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
  Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0
  tlb_flushall_shift: 6
  Freeing SMP alternatives: 36k freed
  ACPI: Core revision 20130328
  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: [<ffffffff8107730d>] wake_up_process+0xd/0x40
  PGD 0
  Oops: 0000 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.40-rt40_preempt-rt #1
  Hardware name: Insyde Grantley/Type2 - Board Product Name1, BIOS 05.04.07 04/21/2014
  task: ffffffff81e14440 ti: ffffffff81e00000 task.ti: ffffffff81e00000
  RIP: 0010:[<ffffffff8107730d>]  [<ffffffff8107730d>] wake_up_process+0xd/0x40
  RSP: 0000:ffff88107fc03f68  EFLAGS: 00010086
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000007ffefbff
  RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffff88107fc03f70 R08: 0000000000000002 R09: 0000000000000003
  R10: 0000000000000000 R11: 0000000000000001 R12: ffff88103f03d100
  R13: ffff880ff4e0c000 R14: ffff88107fc16f00 R15: ffff880ff4e0c000
  FS:  0000000000000000(0000) GS:ffff88107fc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000000001e0f000 CR4: 00000000001406f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000000000

  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Stack:
   ffff88107fc0ccf0 ffff88107fc03f80 ffffffff8101f900 ffff88107fc03f98
   ffffffff8102169d ffff88107fc0fab0 ffff88107fc03fa8 ffffffff81022051
   ffffffff81e01d48 ffffffff819a8a9a ffffffff81e01bf8 <EOI>  ffffffff81e01d48
  Call Trace:
   <IRQ>
   [<ffffffff8101f900>] mce_notify_irq+0x30/0x40
   [<ffffffff8102169d>] intel_threshold_interrupt+0xbd/0xe0
   [<ffffffff81022051>] smp_threshold_interrupt+0x21/0x40
   [<ffffffff819a8a9a>] threshold_interrupt+0x6a/0x70
   <EOI>
   [<ffffffff8199c57c>] ? __slab_alloc.isra.48+0x39e/0x60c
   [<ffffffff814369d5>] ? acpi_ps_alloc_op+0x9a/0xa1
   [<ffffffff811534a8>] ? kmem_cache_free+0xb8/0x2b0
   [<ffffffff81152be4>] kmem_cache_alloc+0x234/0x2e0
   [<ffffffff814369d5>] ? acpi_ps_alloc_op+0x9a/0xa1
   [<ffffffff814369d5>] acpi_ps_alloc_op+0x9a/0xa1
   [<ffffffff8143523f>] acpi_ps_get_next_arg+0xfe/0x3d3
   [<ffffffff814357a4>] acpi_ps_parse_loop+0x290/0x560
   [<ffffffff814364bc>] acpi_ps_parse_aml+0x98/0x28c
   [<ffffffff8143242c>] acpi_ns_one_complete_parse+0x104/0x124
   [<ffffffff8143247f>] acpi_ns_parse_table+0x33/0x38
   [<ffffffff81431e56>] acpi_ns_load_table+0x4a/0x8c
   [<ffffffff81439d6e>] acpi_load_tables+0xa2/0x176
   [<ffffffff81f4dbf3>] acpi_early_init+0x70/0x100
   [<ffffffff81f1c4e9>] ? check_bugs+0xe/0x2d
   [<ffffffff81f14df2>] start_kernel+0x387/0x3b5
   [<ffffffff81f14874>] ? repair_env_string+0x5c/0x5c
   [<ffffffff81f145ad>] x86_64_start_reservations+0x2a/0x2c
   [<ffffffff81f1467b>] x86_64_start_kernel+0xcc/0xcf
  Code: 8b 52 18 e9 9e fc ff ff 48 89 45 c0 e8 cd df 92 00 48 8b 45 c0 eb e5 0f 1f 80 00 00 00 00 e8 fb 04 93 00 55 48 89 e5 53 48 89 fb <48> 8b 07 a8 0c 75 12 48 89 df 31 d2 be 03 00 00 00 e8 ad fb ff
  RIP  [<ffffffff8107730d>] wake_up_process+0xd/0x40
   RSP <ffff88107fc03f68>
  CR2: 0000000000000000
  ---[ end trace 0000000000000001 ]---
  Kernel panic - not syncing: Fatal exception in interrupt

Evidently the hardware has issues, but we can handle this more
gracefully by ignoring the events that happen before the
device_initcall has registered the mce handler thread.

Signed-off-by: Paul Gortmaker <paul.gortmaker@xxxxxxxxxxxxx>

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index aaf4b9b94f38..94860c521fb8 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1391,6 +1391,11 @@ static int mce_notify_work_init(void)
 
 static void mce_notify_work(void)
 {
+	if (unlikely(!mce_notify_helper)) {
+		pr_info(HW_ERR "Machine check event before MCE init; ignored\n");
+		return;
+	}
+
 	wake_up_process(mce_notify_helper);
 }
 #else
-- 
2.0.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html