[patch v2 04/10] kdump: Trigger kdump via panic notifier chain on s390

vgoyal@xxxxxxxxxx (Vivek Goyal) · Thu, 4 Aug 2011 17:14:32 -0400

On Wed, Aug 03, 2011 at 11:50:39AM +0200, Michael Holzheu wrote:
> Hello Vivek,
> 
> On Tue, 2011-08-02 at 15:21 -0400, Vivek Goyal wrote:
> > > We have added the panic notifier in the past in order to be able to
> > > configure the action that should be done in case of panic using our
> > > shutdown actions infrastructure. We can configure the action using sysfs
> > > and we are able to configure that a stand-alone dump should be started
> > > as action for panic.
> > > 
> > > Now with the two stage dump approach we would like to keep the
> > > possibility to trigger a stand-alone dump even if kdump is installed.
> > > The stand-alone dumper will be started in case of a kernel panic and
> > > then the procedure we discussed will happen: Jump into kdump and if
> > > program check occurs do stand-alone dump as backup.
> > 
> > Frankly speaking this jumping to stand alone kernel by default is not
> > making any sense to me. Once you have already determined from /sys that
> > in case of crash a user has set the action to kdump, then we should
> > simply call crash_kexec()
> 
> If the user has set the panic action to kdump, we jump directly to
> crash_kexec(). This then works like on all other architectures.

Ok, that's good to know that you will define panic action as kdump also
and in that case we will not jump to dump tools and directly call
crash_kexec()

> 
> Only if the user has specified panic action stand-alone dump, we do the
> detour via the stand-alone dump tools.

If a user decides to load kdump kernel to capture dump, then why does it
still make sense to set panic action as "stand-alone dump tools". One
could argue that user loaded kdump kernel but not necessarily wants
that mechanism to use, in that case dump-tools does not have to jump
to kdump kernel at all. 

> 
> > like other architectures and jump to stand
> > alone kernel only if some piece of code is corrupted and that action
> > failed.
> > 
> > What's the point of jumping to stand alone kenrel in case of panic()
> > and then re-enter it back to original kernel using crash_kexec(). Sound
> > like a very odd design choice to me.
> > 
> > I am now I am repeating this question umpteen time simply because
> > I never got a good answer except "we have to do it this way".
> 
> Sometimes communication is really hard and frustrating.
> ... but at least we are still communicating.
> 
> Ok very last try:
> 
> * We can use the same mechanism for manual dump and automatic dump on
> panic: IPL the stand-alone dump tools.

So manual dump/intervention is only required if automatic dump failed?

> kdump check and backup
> stand-alone dump is implemented only in the stand-alone dump code.

My argument is that why stand alone dump is trying to trigger kdump
at all? Shouldn't it all be part of loading kdump kernel and user
setting panic() action to kdump?

The only valid argument to try to load kdump kernel from dump tools is
the hard hang situation where we never made to panic(). Then either
that hypervisor timer or manual intervention will come into picture
and one might argue that we still will kdump a try.

Fox x86 it is relatively easy as NMI detects hard hang in the context
of first kernel and can easily call crash_kexec() without any additional
information passing.

So if it is about hard hang, i can still understand the need to jump
to crash_kexec() from dump tools. I don't know if it is possible to 
invoke crash_kexec() directly from hypervisor timer without ipling
dump tools or not.

> If we
> would do it like you suggested, we would have to do it twice - in the
> kernel and in the stand-alone dump tools:
> - kernel: Try kdump and if kdump fails trigger standalone dump tool
> - Stand-alone dump tool: Try kdump and if kdump fails do full dump

Are we not already doing above two steps? You just mentioned that 
if user specified "kdump" as panic() action, then you will call
crash_kexec() directly. Will we not jump to dump tools if kdump
fails?

Also if user specified "dump-tools" as action, then your way of things
anyway will try to execute kdump (if kernel is loaded) and if that fails
then we come back to dump tools.

So I think in current scheme of things, you already have both implemented.
> 
> * Still the panic action is configured via sysfs as the user is already
> used to on s390.

I asked the question above why it makes sense to configure panic action
as dump tools if kdump kernel is loaded.

> 
> * It fits much better into our whole s390 infrastructure. Believe me, we
> have discussed that here a long time. I think you do not have a full
> overview here. Perhaps you just have to believe that.

That's what I am talking about. So many times the answer has been "We have
to do it this way".

Sure I do not have full overview here but little explanation sometimes
help in understanding things.

Thanks
Vivek