Vivek Goyal <vgoyal@xxxxxxxxxx> writes: > On Tue, Jul 14, 2015 at 05:29:53PM +0000, dwalker@xxxxxxxxxx wrote: > > [..] >> > >> > If a machine is failing, there are high chance it can't deliver you the >> > >> > notification. Detecting that failure suing some kind of polling mechanism >> > >> > might be more reliable. And it will make even kdump mechanism more >> > >> > reliable so that it does not have to run panic notifiers after the crash. >> > >> >> > >> I think what your suggesting is that my company should change how it's hardware works >> > >> and that's not really an option for me. This isn't a simple thing like checking over the >> > >> network if the machine is down or not, this is way more complex hardware design. >> > > >> > > That means you are ready to live with an unreliable design. There might be >> > > cases where notifier does not get run properly and you will not do switch >> > > despite the fact that OS has failed. I was just trying to nudge you in >> > > a direction which could be more reliable mechanism. >> > >> > Sigh I see some deep confusion going on here. >> > >> > The panic notifiers are just that panic notifiers. They have not been >> > nor should they be tied to kexec. If those notifiers force a switch >> > over of between machines I fail to see why you would care if it was >> > kexec or another panic situation that is forcing that switchover. >> >> Hidehiro isn't fixing the failover situation on my side, he's fixing register >> information collection when crash_kexec_post_notifiers is used. > > Sure. Given that we have created this new parameter, let us fix it so that > we can capture the other cpu register state in crash dump. > > I am little disappointed that it was not tested well when this parameter was > introuced. We should have atleast tested it to the extent to see if there > is proper cpu state present for all cpus in the crash dump. > > At that point of time it looked like a simple modification > to allow panic notifiers before crash_kexec(). Either that or we say no one cares enough, and it known broken so let's just revert the fool thing. I honestly can't see how to support panic notifiers, before kexec. There is no way to tell what is being done and all of the pieces including smp_send_stop are known to be buggy. It isn't like this latest set of patches was reviewed/tested much better, as the first patch was wrong. Eric