Re: [PATCH][RT] x86: Fix an RT MCE crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/30/2016 10:51 AM, Steven Rostedt wrote:
On Thu, 30 Jun 2016 09:49:19 -0500
Corey Minyard <minyard@xxxxxxx> wrote:

On 06/30/2016 08:43 AM, Steven Rostedt wrote:
On Thu, 30 Jun 2016 08:24:49 -0500
minyard@xxxxxxx wrote:
From: Corey Minyard <cminyard@xxxxxxxxxx>

On some x86 systems an MCE interrupt would come in before the kernel
was ready for it.  Looking at the latest RT code, it has similar
(but not quite the same) code, except it adds a bool that tells if
MCE handling is initialized.  Add the same bool for older versions.

Signed-off-by: Corey Minyard <cminyard@xxxxxxxxxx>
---
   arch/x86/kernel/cpu/mcheck/mce.c | 5 ++++-
   1 file changed, 4 insertions(+), 1 deletion(-)

We noticed this issue on a new Broadwell system when we booted RT
on it.  This patch is for 3.10, I'm not sure if it applies to
other kernel versions.
Do you mean other 'older' versions? and that this works with the
versions after 3.10 without this patch?
I haven't look at supported kernel versions besides 3.10 and 4.4.
The fix was from the 4.4 version of this code.  This patch fixes
v3.10-rt; I can look at finding which other versions need this.  I
was planning to do this, but I wanted to get the patch out for
comments first.
I'm not an MCE expert (I just Cc'd one though ;-)

Ok.  It's not really an MCE bug per say, just an initialization
order bug.


OK, so you are saying that the fix was from 4.4-rt? I can go and look
for it, and if so, I can add it to the "backport" patches I need to do.
Which I need to go and do that soon (backport patches from previous
versions). It may already be in that list.

The fix was from 4.4-rt, but it's not a separate fix.  The 4.4 change is
d21959b8ad98 (x86/mce: use swait queue for mce wakeups)
and it's doing the same thing as the 3.10-rt change
49fe500d2abd (x86/mce: Defer mce wakeups to threads for
PREEMPT_RT).

The 3.10-rt change just doesn't have the bool that fixes the
initialization order issue.

-corey


-- Steve

-corey

-- Steve
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index aaf4b9b..7125584 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1365,6 +1365,7 @@ static void __mce_notify_work(void)
   }
#ifdef CONFIG_PREEMPT_RT_FULL
+static bool notify_work_ready __read_mostly;
   struct task_struct *mce_notify_helper;
static int mce_notify_helper_thread(void *unused)
@@ -1386,12 +1387,14 @@ static int mce_notify_work_init(void)
   	if (!mce_notify_helper)
   		return -ENOMEM;
+ notify_work_ready = true;
   	return 0;
   }
static void mce_notify_work(void)
   {
-	wake_up_process(mce_notify_helper);
+	if (notify_work_ready)
+		wake_up_process(mce_notify_helper);
   }
   #else
   static void mce_notify_work(void)

--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux