Re: [PATCH][RT] x86: Fix an RT MCE crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/30/2016 03:34 PM, Borislav Petkov wrote:
On Thu, Jun 30, 2016 at 02:44:42PM -0500, Corey Minyard wrote:
I don't think they are.  I think there is something about this
particular board.  We aren't having any issues with other systems.
Right, so the fact that it raises the thresholding interrupt could
mean that it generates a bunch of correctable ECC errors and it hits a
threshold which is signalled by that interrupt.

And if that is true, then you should be seeing some errors in mcelog or
sb_edac reporting some.

You could, just in case, try latest upstream and enable
CONFIG_EDAC_SBRIDGE and check dmesg for some ECCs.

Or, of course, something else entirely might be funny with that box,
causing that interrupt to fire.

You are right, I enabled that on the tip of master and I get the
following spewing out for a while:

EDAC MC0: 27843 CE memory read error on CPU_SrcID#0_Ha#0_Chan#1_DIMM#0 (channel:1 slot:0 page:0x102c offset:0x180 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0001:0091 socket:0 ha:0 channel_mask:2 rank:0)

So there's apparently something broken in the hardware.

But as you say, the kernel should be ready for this.
Right, and we've removed that mce_notify_irq() call in
intel_threshold_interrupt() with

   f29a7aff4bd6 ("x86/mce: Avoid potential deadlock due to printk() in MCE context")

but that's more of a side-effect of that patch.

And if you want to backport it, you'd need the mce_gen_pool_add() and
remaining machinery for the genpool.

That sounds like a bit much.

Steven, what would you like to do here?

Thanks,

-corey

Presumably, booting with "mce=no_cmci" should fix this but then you
won't have the CMCI thresholding, i.e., the interrupt which gets raised
when a certain amount of correctable errors has been generated.

Hmm, a funny box that.


--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux