On Mon, Jul 11, 2011 at 05:56:26PM +0200, Martin Schwidefsky wrote: [..] > > kexec-tools purgatory code already has the checksum logic. So you don't > > have to redo that in stand alone tools. I think you probably need to > > s390 specic purgatory and jump to IPLing stand alone kernel if kdump > > kernel is corrupted instead of rebooting back or spinning infinitely > > in the loop/ > > I can not quite follow you here. The purgatory code is part of the kdump kernel, > no? When we trigger a dump with the stand-alone tools we will start executing > code in the assembler function of that stand-alone tools. We can not trust > the kdump kernel yet, not without doing the checksums first. Purgatory is another piece of binary code which is loaded along with kdump kernel in reserved memory area. So yes, there is a chance that this code itself get corrupted. So in case of stand alone dump, you save the calculated checksum of kdump kernel at disk and not in memory? And then calculate the checksum of memory image of kdump kernel and decide whether kdump kenrel is corrupted or not? If yes, this sounds more reliable as checksum of kernel is stored on some disk/tape. [..] > > Ok. So again why not reuse the checksump capability of kexec-tools and > > instead of infinite looping you can jump to stand alone tools + IPL etc. > > I understand this will require a tighter integration with kexec-tools > > and using ELF header mechanism and will not cover the early kernel > > crashes. > > Imho the checksum of kexec-tools is in the wrong place. Because you think that stored checksum can get corrupted? [..] > > To me we seem to be diverging a lot from existing kdump+kexec-tools > > mechanism just to solve the case of early crash dumping. If we break > > down the problem in two parts and do thing kexec-tools way (with a > > backup path of booting stand alone kernel if kdump kenrel is corrupted), > > things might be better. > > The "backup path of booting stand alone kernel" would result in passing > the control twice, once from the stand-alone dumper to the kexec purgatory > (after the purgatory checksum has been verified), then doing more checks > in the kdump kernel, only to return to the stand-alone dumper if some check > fails. Does not really sound enticing to me. What I am suggesting is that stand alone dumper gets control only if kdump kernel is corrupted. So following sequence. Kernel Crash ---> purgatory --> either kdump kenrel/IPL stand alone tools Here only drawback seems to be that we assume that purgatory code and pre-calculated checksum has not been corrupted. The big advantage is that s390 kdump support looks very similar to other arches and understaning and supporting kdump across architectures becomes easy. Thanks Vivek