"Ahmed S. Darwish" <darwish.07 at gmail.com> writes: > On Sat, Feb 26, 2011 at 04:57:30PM -0800, Eric W. Biederman wrote: >> "H. Peter Anvin" <hpa at zytor.com> writes: >> > >> > I can't see any sane reason to *not* make kexec purgatory >> > position-independent. It is the obvious thing to do. >> >> This isn't a case of the code not being position independent. This is >> case of where the relocations are applied. >> >> I can see a couple of handling this with different tradeoffs. >> >> 1) We teach bootloaders how to load two kernels at once. This >> completely avoids the purgatory, as it is replaced by code in the >> bootloader that already exists to load the primary kernel and setup >> it's arguments. >> > > This is in fact my plan. Using Syslinux, I loaded 'purgatory.ro' to RAM > thinking that it will still be needed. Re-checking the purgatory code > now after reading above note, it seems it does 5 important points: > > a) reset the VGA (if instructed) > b) reset the PIC to legacy mode (if instructed) > c) check the overall integrity of the second kernel image (SHA-2) > d) setup the environment for second kernel entry (switch back to > 32-bit protected mode in x86-64, reset registers, etc) > e) saves the first 640K in a backup region > > So (a) and (b) can be done elsewhere if needed; (c) isn't needed cause > if the bootloader corrupts images, we have bigger problems; (d) can be > done as a stub; (e), on the contrary of kdump, isn't critical for my > goals. (c) Is needed somewhere on the initialization path, because we don't start running until after a kernel has crashed. For a first prototype it can probably be skipped. (e) Is there because the first 640K is the only memory of the original kernel that we use. I suspect the copying of the first 640K to somewhere reserved for it, and the verifying the sha256 checksum are things we can move into the kernels boot. But seriously prototype it and get something that works. I don't know of a case where in practice I have gotten a checksum failure. Saving the first 640K is sort of important but again we don't do much down there except boot secondary cpus so you can probably deal with that later. There is also some magic we do with ELF headers to describe memory regions and to find elf notes written by the crashed kernel when it goes down. Those notes the existing tools use to find all kinds of things. See vmcore-to-dmesg in the /sbin/kexec source tree. If you don't want the full core I expect you want to be able to run that program. I'm not ready to change how the crash recovery kernel on finds what is going on. The elf header and elf notes. It is already kernel agnostic etc, but I am totally willing to change how we implement it. Eric