RE: [PATCH -next v4 2/3] x86/mce: rename MCE_IN_KERNEL_COPYIN to MCE_IN_KERNEL_COPY_MC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> > At least on Intel you can only get a machine check for operation on poison data LOAD.
> > Not for a STORE. I believe that is generally true - other arches to confirm.
>
> So what happens if you store to a poisoned cacheline on Intel? It'll
> raise a poison consumption error when that cacheline is loaded in the
> cache? Because you need to load that line into the cache for writing,
> I'd presume...

There are two places in the pipeline where poison is significant.

1) When the memory controller gets a request to fetch some data. If the ECC
check on the bits returned from the DIMMs the memory controller will log
a "UCNA" signature error to a machine check bank for the memory channel
where the DIMMs live. If CMCI is enabled for that bank, then a CMCI is
sent to all logical CPUs that are in the scope of that bank (generally a
CPU socket). The data is marked with a POISON signature and passed
to the entity that requested it. Caches support this POISON signature
and preserve it as data is moved between caches, or written back to
memory. This may have been a prefetch or a speculative read. In these
cases there won't be a machine check. Linux uc_decode_notifier() will
try to offline pages when it sees UCNA signatures.

2) When a CPU core tries to retire an instruction that consumes poison
data, or needs to retire a poisoned instruction. These log an SRAR signature
into a core scoped bank (on most Xeons to date bank 0 for poisoned instructions,
bank 1 for poisoned data consumption). Then they signal a machine check.

> What happens if you have bits flipped in the cacheline you want to write
> to?
>
> That's fine because you're overwriting them anyway?
>
> I'd presume ECC check gets performed on cacheline load and then you'll
> have to raise an #MC...

Partial cacheline stores to data marked as POISON in the cache maintain
the poison status. Full cacheline writes (certainly with MOVDIR64B instruction,
possibly with some AVX512 instructions) can clear the POISON status (since
you have all new data). A sequence of partial cache line stores that overwrite
all data in a cache line will NOT clear the POISON status.

Nothing is logged or signaled when updating data in the cache.

-Tony




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux