Hard to debug kernel issues (was Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 11 Jan 2009 11:26:41 pm David Woodhouse wrote:

> Sometimes you weren't going to get a backtrace if something goes wrong
> _anyway_.

Case in point - we've been struggling with some of our SuperMicro based 
systems with AMD Barcelona B3 k10h CPUs *turning themselves off* when running 
various HPC applications.

Nothing in the kernel logs, nothing in the IPMI controller logs. It's just 
like someone has wandered in and held the power button down (and no, it's not 
that).

It's been driving us up the wall.

We'd assumed it was a hardware issue as it was happening with all sorts of 
kernels but today we tried 2.6.29-rc1 "just in case" and I've not been able to 
reproduce the crash (yet) on a node I can crash in about 30 seconds, and 
rebooting back into 2.6.28 makes it crash again.

If the test boxes are still alive tomorrow I might see if we can attempt some 
form of a reverse bisect to track down what commit fixed it (git doesn't seem 
to support that so we've going to have to invert the good/bad commands).

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP

Attachment: signature.asc
Description: This is a digitally signed message part.


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux