On Mon, Oct 20, 2008 at 12:05:43PM -0700, Anirban Sinha wrote: > Thanks for responding and posting the patch. There is actually a another > important issue of a more general nature. I have already posted this in > the general Linux kernel mailing list under the subject "panic() logic". > The crux of the issue is: > > The panic() call does a smp_send_stop() pretty early in the call > process for SMP systems. smp_send_stop basically marks all the other > cores as 'down' and > updates the cpu bitmap. One implication of this is that you cannot do > an IPI later on to other cores. However, interestingly, mips sibyte > processor tries to do a cfe_exit() through an IPI as a part of > emergency_reboot() that is called pretty late in the panic() logic. > > As a consequence of this, if a panic happens on a back core, the system > simply hangs and never actually does a "rebooting in 5 sec" thing. Interesting. I've observed this effect frequently. But without researching the issue further I did blame CFE for it. > I believe the way panic logic is organized is in conflict with the > requirements of some archs, for example our mips sibyte arch. Currently, > the arch independent logic defeats the main purpose of the arch > dependent emergency_restart() function which is to restart the system. > In a vast majority of the cases, we do have a perfectly sane and > functional front core and we are just not able to gracefully reboot the > system because we are limited by the way panic() handles the shutdown > logic. If there are other archs that does a similar specific operation > for the front core as a part of 'emergency restart', they are all > defeated. SMP systems generally have some sledgehammer mechanism that can be used to trigger a hardware reset of another or all cores. We probably should use that instead of relying on firmware - which in many cases becomes unusable after Linux initialization. > I believe, the way to solve this problem is that the archs themselves > take the responsibility of shutting down the core and not the generic > panic() call. The actual power down mechanism is arch dependent anyway, > so I guess it can be made to be a part of emergency_shutdown(). The arch > independent kernel code will then simply do the necessary arch > independent things to handle panic and simply call emergency_reboot() to > do the rest of the arch specific stuff, including powering down the > cores. It would certainly make some sense in this particular scenario. Ralf