[patch 0/9] kdump: Patch series for s390 support

holzheu@xxxxxxxxxxxxxxxxxx (Michael Holzheu) · Wed, 06 Jul 2011 11:24:47 +0200

Hello Vivec,

On Tue, 2011-07-05 at 16:26 -0400, Vivek Goyal wrote:
> On Mon, Jul 04, 2011 at 07:09:22PM +0200, Michael Holzheu wrote:

[snip]

> I don't understand what is stand-alone dump tools and 

S390 stand-alone dump tools are independent mini operating systems that
are installed on disks or tapes. When a dump should be created, these
stand-alone dump tools are booted. All that they do is to write the dump
(current memory plus the CPU registers) to the disk/tape device.

The advantage compared to kdump is that since they are freshly loaded
into memory they can't be overwritten in memory. Another advantage is
that since it is different code, it is much less likely that the dump
tool will run into the same problem than the previously crashed kernel.
Also the boot process ensures that the hardware is in a initialized
state. And last but not least, with the stand-alone dump tools you can
dump early kernel problems which is not possible using kdump, because
you can't dump before the kdump kernel has been loaded with kexec.

That were more or less the arguments, why we did not support kdump in
the past.

In order to increase dump reliability with kdump, we now implemented a
two stage approach. The stand-alone dump tools first check via meminfo,
if kdump is valid using checksums. If kdump is loaded and healthy it is
started. Otherwise the stand-alone dump tools create a full-blown
stand-alone dump.

With this approach we still keep our s390 dump reliability and gain the
great kdump features, e.g. distributor installer support, dump filtering
with makedumpfile, etc.

> why the existing
> mechanism of preparing ELF headers to describe all the above info
> and just passing the address of header on kernel commnad line
> (crashkernel=) will not work for s390. Introducing an entirely new
> infrastructure for communicating the same information does not
> sound too exciting.

We need the meminfo interface anyway for the two stage approach. The
stand-alone dump tools have to find and verify the kdump kernel in order
to start it. Therefore the interface is there and can be used. Also
creating the ELF header in the 2nd kernel is more flexible and easier
IMHO:
* You do not have to care about memory or CPU hotplug.
* You do not have to preallocate CPU crash notes etc.
* It works independently from the tool/mechanism that loads the kdump
kernel into memory. E.g. we have the idea to load the kdump kernel at
boot time into the crashkernel memory (not via the kexec_load system
call). That would solve the main kdump problems: The kdump kernel can't
be overwritten by I/O and also early kernel problems could then be
dumped using kdump.

What do you think?

Michael