On Dec 11, 2007 4:52 PM, Neil Horman <nhorman at tuxdriver.com> wrote: > On Tue, Dec 11, 2007 at 04:16:32PM -0800, Ben Woodard wrote: > > We may need to go back and do some additional work on this. It doesn't > > seem to be quite as cut and dried as we initially thought. > > > > This quirk doesn't appear to work on virtually the same motherboard with > > the barcelona processors in it. It also may be sensitive to the firmware > > version. More extensive testing on a larger number of pre-production is > > not showing it to be as effective as it appeared to be initially on the > > testbed. > > > > I'm doing some retesting to figure out what exact situations and > > collection of patches were able to make it work before. > > > Ben, please lets be clear about this. You say this patch doesn't help on a new > system. Even thought its almost the exact same system, its not the same system. > Does this patch work consistently on the system you initially reported the > problem on? I've done enough work on this at this point that I'm invested in > not abandoning this fix. If this solves the problem on dual core system, but > not quad core, I'd much rather move forward with this fix and address your quad > core problem as a separate issue. > > Neil > > > > -ben > > > > > > > > Neil Horman wrote: > > > Recently a kdump bug was discovered in which a system would hang inside > > > calibrate_delay during the booting of the kdump kernel. This was caused by the > > > fact that the jiffies counter was not being incremented during timer > > > calibration. The root cause of this problem was found to be a bios > > > misconfiguration of the hypertransport bus. On system affected by this hang, > > > the bios had assigned APIC ids which used extended apic bits (more than the > > > nominal 4 bit ids's), but failed to configure bit 17 of the hypertransport > > > transaction config register, which indicated that the mask for the destination > > > field of interrupt packets accross the ht bus (see section 3.3.9 of > > > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF). > > > If a crash occurs on a cpu with an APIC id that extends beyond 4 bits, it will > > > not recieve interrupts during the kdump kernel boot, and this hang will be the > > > result. The fix is to add this patch, whcih add an early pci quirk check, to > > > forcibly enable this bit in the httcfg register. This enables all cpus on a > > > system to receive interrupts, and allows kdump kernel bootup to procede > > > normally. > > > > > > Regards > > > Neil > > > > > > > > > Signed-off-by: Neil Horman <nhorman at tuxdriver.com> > > > ... > > > static struct chipset early_qrk[] __initdata = { > > > - { PCI_VENDOR_ID_NVIDIA, nvidia_bugs }, > > > - { PCI_VENDOR_ID_VIA, via_bugs }, > > > - { PCI_VENDOR_ID_ATI, ati_bugs }, > > > + { PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI, PCI_ANY_ID, nvidia_bugs }, > > > + { PCI_VENDOR_ID_VIA, PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI, PCI_ANY_ID, via_bugs }, > > > + { PCI_VENDOR_ID_ATI, PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI, PCI_ANY_ID, ati_bugs }, > > > + { PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_K8_NB, PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, fix_hypertransport_config }, ==> + { PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_K8_NB, PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, fix_hypertransport_config }, + { PCI_VENDOR_ID_AMD, 0x1200 , PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, fix_hypertransport_config }, I still think good way is that you ask Supermicro to update their BIOS to use newer code from AMD. YH