I just wanted to let folks know what I've been working on, sparc wise. I have this reocurring issue where one of my workstations hangs completely, no keyboard input, no console messages, nothing. Since we have pseudo-NMI support in oprofile via performance counters in the current tree I worked on rearchitecting this so that a nice NMI watchdog layer could be added. It is modelled after the x86 NMI watchdog, with the major difference being that it is enabled by default. The cost is one interrupt per second, and the payback is enormous wrt. the ability to debug complete system hangs. Basically how it works is if we see no timer interrupts processed for 5 seconds we print a message, dump registers, and optionally panic the system. This will be supported on any system that has profiling counter overflow interrupt support. That essentially means any cpu from UltraSPARC-III onward (including Niagara chips). Another nice side effect of this work is that it gives us some of the framework necessary for whatever generic performance counter layer gets merged into the tree in the future (Ingo Molnar's work, perfmon3, whatever). I noticed while doing these changes that we need some work in the handling of OOPSes and other errors. In particular we need to start using the existing generic infrastructure the kernel provides, such as oops_enter(), oops_exit(), bust_spinlocks(), etc. I do intend to work on this. I'm currently busy doing testing to make sure that the NMI watchdog and oprofile work as expected. I'll post the patches when I check them in. I intend to push this into the current stable tree because there are entire classes of bugs people run into which can't be analyzed at all without this kind of facility. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html