Sorry, I snipped the C states table from my modified intel_idle.c, not the original. I should have shown > static const struct x86_cpu_id intel_idle_ids[] = { > <snip> > ICPU(0x2c, idle_cpu_nehalem), > <snip> > {} > }; > MODULE_DEVICE_TABLE(x86cpu, intel_idle_ids); Larry Baker US Geological Survey 650-329-5608 baker@xxxxxxxx On 11 Jun 2013, at 2:30 PM, Larry Baker wrote: > I have an IBM System x3650 M3 with an Intel Xeon L5630 processor. My IBM support team alerted me to an issue with C6 states for those processors when running Linux and the intel_idle kernel module is used. The IBM solutions page, http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=migr-5091901, recommends disabling intel_idle. I propose intel_idle disable C6 for the affected processors. > > Description: > > Intel Xeon 5600 and Core i7-900 processors (Family 6 Model 44) have a flaw when the C6 state is used. See Intel® Xeon® Processor 5600 Series Specification Update May 2012 (http://www.intel.eu/content/dam/www/public/us/en/documents/specification-updates/xeon-5600-specification-update.pdf) and Intel® CoreTM i7-900 Desktop Processor Extreme Edition Series and Intel® CoreTM i7-900 Desktop Processor Series on 32-nm Process Specification Update June 2013 (http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/core-i7-900-ee-and-desktop-processor-series-32nm-spec-update.pdf): > >> Package C6 Transitions May Cause Memory Bit Errors to be Observed >> >> Problem: >> During Package C6 transitions, internal signaling noise may cause the DDRx_CKE signal to become asserted during self-refresh. These assertions may result in memory bit errors upon exiting from the package C6 state. Due to this erratum the DDRx_CKE signals can be driven during times in which the DDR3 JEDEC specification requires that they are idle. >> >> Implication: >> DDRx_CKE signals can be driven during package C6 memory self-refresh creating an invalid memory DRAM state. A system hang, memory ECC errors or unpredictable system behavior may occur when exiting the package C6 state. >> >> Workaround: >> It is possible for the BIOS to contain a workaround for this erratum. >> >> Status: >> For the steppings affected, see the Summary Table of Changes. > > The intel_idle kernel module uses the standard Nehalem C states table for these processors (boot_cpu_data.x86_model==0x2c): > >> static const struct x86_cpu_id intel_idle_ids[] = { >> <snip> >> ICPU(0x2c, idle_cpu_xeon5600), >> <snip> >> {} >> }; >> MODULE_DEVICE_TABLE(x86cpu, intel_idle_ids); > > I propose two alternatives for intel_idle.c to avoid the processor flaw. (I didn't try to compile them -- I only modified intel_idle.c.) The first limits max_cstates=3, and prints in a console message if the limit was forced. The second creates a Xeon 5600-specific C states table which leaves off the C6 state. No message is printed in that case. I like the idea of a message, and I like the idea of a proper C states table. I did not look to see if acpi_idle can also benefit from a similar modification. > > Thank you, > > Larry Baker > US Geological Survey > 650-329-5608 > baker@xxxxxxxx > > ===== Version 1 Patch - Limit max_cstate=3 for Family 6 Model 44 Processors ===== > > --- intel_idle.c.orig 2013-06-11 11:41:49.000000000 -0700 > +++ intel_idle-fix-v1.c 2013-06-11 13:47:15.000000000 -0700 > @@ -534,2 +534,13 @@ > > +/* > + * Disable C6-state for Xeon 5600 and Core i7-900. Refer to Xeon 5600 errata > + * BD104 and Core i7-900 errata BC82: Package C6 Transitions May Result in > + * Single and Multi-Bit Memory Errors. > + */ > + if (boot_cpu_data.x86_model == 0x2c && > + max_cstate > 3) { > + pr_debug(PREFIX "limiting model 0x2c to max_cstate=3\n"); > + max_cstate = 3; > + } > + > pr_debug(PREFIX "lapic_timer_reliable_states 0x%x\n", > > ===== Version 2 Patch - Disable C6 for Family 6 Model 44 Processors ===== > > --- intel_idle.c.orig 2013-06-11 11:41:49.000000000 -0700 > +++ intel_idle-fix-v2.c 2013-06-11 13:51:32.000000000 -0700 > @@ -158,2 +158,33 @@ > > +/* > + * Disable C6-state for Xeon 5600 and Core i7-900. Refer to Xeon 5600 errata > + * BD104 and Core i7-900 errata BC82: Package C6 Transitions May Result in > + * Single and Multi-Bit Memory Errors. > + */ > +static struct cpuidle_state xeon5600_cstates[CPUIDLE_STATE_MAX] = { > + { > + .name = "C1-NHM", > + .desc = "MWAIT 0x00", > + .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID, > + .exit_latency = 3, > + .target_residency = 6, > + .enter = &intel_idle }, > + { > + .name = "C1E-NHM", > + .desc = "MWAIT 0x01", > + .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID, > + .exit_latency = 10, > + .target_residency = 20, > + .enter = &intel_idle }, > + { > + .name = "C3-NHM", > + .desc = "MWAIT 0x10", > + .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, > + .exit_latency = 20, > + .target_residency = 80, > + .enter = &intel_idle }, > + { > + .enter = NULL } > +}; > + > static struct cpuidle_state snb_cstates[CPUIDLE_STATE_MAX] = { > @@ -440,2 +471,8 @@ > > +static const struct idle_cpu idle_cpu_xeon5600 = { > + .state_table = xeon5600_cstates, > + .auto_demotion_disable_flags = NHM_C1_AUTO_DEMOTE | NHM_C3_AUTO_DEMOTE, > + .disable_promotion_to_c1e = true, > +}; > + > static const struct idle_cpu idle_cpu_atom = { > @@ -472,3 +509,3 @@ > ICPU(0x25, idle_cpu_nehalem), > - ICPU(0x2c, idle_cpu_nehalem), > + ICPU(0x2c, idle_cpu_xeon5600), > ICPU(0x2e, idle_cpu_nehalem), > -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html