* Ahmed S. Darwish <darwish.07@xxxxxxxxx> wrote: > Hi, > > I've faced some very early panics in latest kernel. Being a run of the mill > x86 laptop, the machine is void of debugging aids like serial ports or > network boot. > > As a possible solution, below patches prototypes the idea of persistently > storing the kernel log ring to a hard disk partition using the enhanced BIOS > 0x13 services. > > The used BIOS INT 0x13 functions are the same ones originally used by all > contemporary bootloaders to load the Linux kernel. If the kernel code is > already loaded to RAM and being executed, such parts of the BIOS should be > stable enough. > > The basic idea is to switch from 64-bit long mode all the way down to 16-bit > real-mode. Once in real-mode, we reset the disk controller and write the log > buffer to disk using a user-supplied absolute disk block address (LBA). > > Doing so, we can capture very early panics (along with earlier log messages) > reliably since the writing mechanism has minimal dependency on any Linux code. > > Unfortunately, there are problems on some machines. > > In my laptop, when calling the BIOS with the "Reset Disk Controllers" command > or even issuing a direct "Extend Write" without a controller reset, the BIOS > hangs for around __5 minutes__. Afterwards, it returns with a 'Timeout' error > code. > > The main problem, it seems, is that the BIOS "Reset controller" command is not > enough to restore disk hardware to a state understandable by the BIOS code. > > So: > > - Is it possible to re-initialize the disk hardware to its POST state (thus > make the BIOS services work reliably) while keeping system RAM unmodified? > - If not, can we do it manually by reprogramming the controllers? > > The first patch (#1) implements the longMode -> realMode switch and invokes > the BIOS. The second reserves needed low-memory areas for such code and > registers a panic logger using the kmsg_dump interface. > > Both patches are on '-next' and include XXX marks where further help is also > appreciated. Please remember that these patches, while tested, are now for > prototyping the technical feasibility of the idea. > > Diffstat: > > arch/x86/kernel/saveoops-rmode.S | 483 ++++++++++++++++++++++++++++++++++++++ > arch/x86/include/asm/saveoops.h | 15 ++ > arch/x86/kernel/saveoops.c | 219 +++++++++++++++++ > arch/x86/kernel/setup.c | 9 + > arch/x86/kernel/Makefile | 3 + > lib/Kconfig.debug | 15 ++ > 6 files changed, 744 insertions(+), 0 deletions(-) Ok, i have to admit that while i'm a rabid BIOS-hater i find this debug feature very very interesting, for the plain reason that if it's implemented in a robust and clever way then this has a chance to improve debuggability of pretty much any Linux laptop quite enormously! While we generally thoroughly hate BIOSes from beginning to end, one thing can be said, a BIOS bootstraps very early during bootup, and it's relatively simple to trigger as well. Also, since latest kernels do not stomp on BIOS data structures anymore (low RAM), there's some good chance it's still functional at the point of crash - be that an early crash or a later crash. I think the biggest areas of practical concern would be: - Can this mechanism ever, under any circumstance corrupt any real data, destroy the MBR or do other nasties. Can you think of any additional fail-safe measures where you could _further robustify the BIOS calls_ to make sure it can never go to the wrong sector(s)? I really do not want to think of trusting a BIOS to _write to my disk_. - Is there some hidden disk area somewhere on PCs, or somewhere on a real partition on typical Linux distributions, which we could use without having to reinstall the box? This would increase utility and availability enormously. I'm thinking of partition _ends_ which sometimes get rounded in an awkward way and which are potentially skipped by most Linux filesystems. Even a very small, 512 bytes of area would be extremely useful for debugging weird suspend hangs ... - Could we automate the recovery of the dump, and just put it into the regular kernel log on the next (successful) bootup (on a feature-enabled kernel)? That would make the log of the 'previous crash' very conveniently available in dmesg and the syslog. Tools like kerneloops could make use of it immediately. All in one, a very intriguing idea IMO, and the hardest bits (lowlevel x86 transition) is all implemented already. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html