On Sun, 11 Jan 2009 11:26:41 pm David Woodhouse wrote: > Sometimes you weren't going to get a backtrace if something goes wrong > _anyway_. Case in point - we've been struggling with some of our SuperMicro based systems with AMD Barcelona B3 k10h CPUs *turning themselves off* when running various HPC applications. Nothing in the kernel logs, nothing in the IPMI controller logs. It's just like someone has wandered in and held the power button down (and no, it's not that). It's been driving us up the wall. We'd assumed it was a hardware issue as it was happening with all sorts of kernels but today we tried 2.6.29-rc1 "just in case" and I've not been able to reproduce the crash (yet) on a node I can crash in about 30 seconds, and rebooting back into 2.6.28 makes it crash again. If the test boxes are still alive tomorrow I might see if we can attempt some form of a reverse bisect to track down what commit fixed it (git doesn't seem to support that so we've going to have to invert the good/bad commands). cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP
Attachment:
signature.asc
Description: This is a digitally signed message part.