On Fri, 27 Mar 2020 19:41:16 -0300 "Guilherme G. Piccoli" <gpiccoli@xxxxxxxxxxxxx> wrote: > Usually when kernel reach an oops condition, it's a point of no return; > in case not enough debug information is available in the kernel splat, > one of the last resorts would be to collect a kernel crash dump and > analyze it. The problem with this approach is that in order to collect > the dump, a panic is required (to kexec-load the crash kernel). When > in an environment of multiple virtual machines, users may prefer to > try living with the oops, at least until being able to properly > shutdown their VMs / finish their important tasks. > > This patch implements a way to collect a bit more debug details when an > oops event is reached, by printing all the CPUs backtraces through the > usage of NMIs (on architectures that support that). The sysctl added > (and documented) here was called "oops_all_cpu_backtrace", and when > set will (as the name suggests) dump all CPUs backtraces. > > Far from ideal, this may be the last option though for users that for > some reason cannot panic on oops. Most of times oopses are clear enough > to indicate the kernel portion that must be investigated, but in virtual > environments it's possible to observe hypervisor/KVM issues that could > lead to oopses shown in other guests CPUs (like virtual APIC crashes). > This patch hence aims to help debug such complex issues without > resorting to kdump. > > ... > > --- a/include/linux/kernel.h > +++ b/include/linux/kernel.h > @@ -513,6 +513,12 @@ static inline u32 int_sqrt64(u64 x) > } > #endif > > +#ifdef CONFIG_SMP > +extern unsigned int sysctl_oops_all_cpu_backtrace; > +#else > +#define sysctl_oops_all_cpu_backtrace 0 > +#endif /* CONFIG_SMP */ > + hm, we have a ton of junk in kernel.h just to communicate between sysctl.c and a handful of other files. Perhaps one day someone can move all that into a new sysctl-externs.h.