Re: [PATCH][RT] x86: Fix an RT MCE crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 30, 2016 at 05:47:29PM -0500, Corey Minyard wrote:
> You are right, I enabled that on the tip of master and I get the
> following spewing out for a while:
>
> EDAC MC0: 27843 CE memory read error on CPU_SrcID#0_Ha#0_Chan#1_DIMM#0
> (channel:1 slot:0 page:0x102c offset:0x180 grain:32 syndrome:0x0 -  OVERFLOW
> area:DRAM err_code:0001:0091 socket:0 ha:0 channel_mask:2 rank:0)
>
> So there's apparently something broken in the hardware.

Yeah, DIMM0 on your socket 0 is generating a bunch of correctable errors
and might go bad soon, the stress being on "might". You could replace
it.

> That sounds like a bit much.

Actually, you probably would need only a couple:

1. 648ed94038c0 ("x86/mce: Provide a lockless memory pool to save error records")

2. 061120aed708 ("x86/mce: Don't use percpu workqueues")
 - that one is unrelated but should be nice for RT as it gets rid of percpu
   workqueues and I know RT hates them :)

3. fd4cf79fcc4b ("x86/mce: Remove the MCE ring for Action Optional errors")
 - this one connects the genpool to MCE

4. f29a7aff4bd6 ("x86/mce: Avoid potential deadlock due to printk() in MCE context")
 - and this is the last one which I meant earlier.

So that's 4 patches, more or less.

Now, you're in the perfect position to test those because you *actually*
have a real-life system which generates those errors so it is the
perfect candidate for testing the backports. And you should test them
with the failing DIMM still in place, of course.

HTH.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux