On 06/21/2013 02:06 PM, Borislav Petkov wrote:
On Fri, Jun 21, 2013 at 01:16:50PM +0530, Naveen N. Rao wrote:
Yes, but I'm afraid this won't work either - mce_banks_owned is
cleared during cpu offline. This is necessary since a cmci
rediscover is triggered on cpu offline, so that if this bank is
shared across cores, a different cpu can claim ownership of this
bank.
What for? Sounds strange to me.
Look at section "15.5.1 CMCI Local APIC Interface" from Intel SDM Vol.
3, and the subsequent section on "System Software Recommendation for
Managing CMCI and Machine Check Resources":
"For example, if a corrected bit error in a cache shared by two logical
processors caused a CMCI, the interrupt will be delivered to both
logical processors sharing that microarchitectural sub-system."
In other words, some of the MC banks are shared across logical cpus in a
core and some across all cores in a package. During initialization, the
first cpu in a core ends up owning most of the banks specific to the
core/package. When this cpu is offlined, we would want the second cpu in
that core to discover and enable CMCI for those MC banks which it shares
with the first cpu.
As an example, consider a hypothetical single-core Intel processor with
Hyperthreading. On init, let's say the first cpu ends up owning banks 1,
2, 3 and 4; and the second cpu ends up owning banks 1 and 2. This would
mean that MC banks 1 and 2 are "hyperthread"-specific, while banks 3 and
4 are shared. Now, if we offline the first cpu, it disables CMCI on all
4 banks. However, banks 3 and 4 are shared. So, if we now do a cmci
rediscovery, the second cpu will see that banks 3 and 4 don't have CMCI
enabled and will then claim ownership of those so that we can continue
to receive and process CMCIs from those subsystems.
Makes sense now?
Thanks,
Naveen
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html