On Mon, Mar 23, 2020 at 07:30:35PM -0300, Guilherme G. Piccoli wrote: > Usually when kernel reach an oops condition, it's a point of no return; > in case not enough debug information is available in the kernel splat, > one of the last resorts would be to collect a kernel crash dump and > analyze it. The problem with this approach is that in order to collect > the dump, a panic is required (to kexec-load the crash kernel). When > in an environment of multiple virtual machines, users may prefer to > try living with the oops, at least until being able to properly > shutdown their VMs / finish their important tasks. > > This patch implements a way to collect a bit more debug details when an > oops event is reached, by printing all the CPUs backtraces through the > usage of NMIs (on architectures that support that). The sysctl/kernel > parameter added (and documented) here was called "oops_all_cpu_backtrace" > and when set will (as the name suggests) dump all CPUs backtraces. > > Far from ideal, this may be the last option though for users that for > some reason cannot panic on oops. Most of times oopses are clear enough > to indicate the kernel portion that must be investigated, but in virtual > environments it's possible to observe hypervisor/KVM issues that could > lead to oopses shown in other guests CPUs (like virtual APIC crashes). > This patch hence aims to help debug such complex issues without > resorting to kdump. > > Signed-off-by: Guilherme G. Piccoli <gpiccoli@xxxxxxxxxxxxx> Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx> -Kees > --- > > > V2: Implemented grammar suggestions from Randy, Andrew and > Matthew. Thanks in advance for the reviews! > Cheers, > > Guilherme > > > .../admin-guide/kernel-parameters.txt | 8 +++++++ > Documentation/admin-guide/sysctl/kernel.rst | 17 +++++++++++++++ > include/linux/kernel.h | 6 ++++++ > kernel/panic.c | 21 +++++++++++++++++++ > kernel/sysctl.c | 11 ++++++++++ > 5 files changed, 63 insertions(+) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 7a14caac6c94..7db622028c00 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -3333,6 +3333,14 @@ > This will also cause panics on machine check exceptions. > Useful together with panic=30 to trigger a reboot. > > + oops_all_cpu_backtrace= > + [KNL] Should kernel generate backtraces on all cpus > + when oops occurs - this should be a last measure resort > + in case a kdump cannot be collected, for example. > + Defaults to 0 and can be controlled by the sysctl > + kernel.oops_all_cpu_backtrace. > + Format: <integer> > + > page_alloc.shuffle= > [KNL] Boolean flag to control whether the page allocator > should randomize its free lists. The randomization may > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst > index 8b4ff69d2348..8660001d3a3e 100644 > --- a/Documentation/admin-guide/sysctl/kernel.rst > +++ b/Documentation/admin-guide/sysctl/kernel.rst > @@ -57,6 +57,7 @@ show up in /proc/sys/kernel: > - msgmnb > - msgmni > - nmi_watchdog > +- oops_all_cpu_backtrace > - osrelease > - ostype > - overflowgid > @@ -572,6 +573,22 @@ numa_balancing_scan_size_mb is how many megabytes worth of pages are > scanned for a given scan. > > > +oops_all_cpu_backtrace: > +================ > + > +If this option is set, the kernel will send an NMI to all CPUs to dump > +their backtraces when an oops event occurs. It should be used as a last > +resort in case a panic cannot be triggered (to protect VMs running, for > +example) or kdump can't be collected. This file shows up if CONFIG_SMP > +is enabled. > + > +0: Won't show all CPUs backtraces when an oops is detected. > +This is the default behavior. > + > +1: Will non-maskably interrupt all CPUs and dump their backtraces when > +an oops event is detected. > + > + > osrelease, ostype & version: > ============================ > > diff --git a/include/linux/kernel.h b/include/linux/kernel.h > index 0d9db2a14f44..6cd00257b572 100644 > --- a/include/linux/kernel.h > +++ b/include/linux/kernel.h > @@ -513,6 +513,12 @@ static inline u32 int_sqrt64(u64 x) > } > #endif > > +#ifdef CONFIG_SMP > +extern unsigned int sysctl_oops_all_cpu_backtrace; > +#else > +#define sysctl_oops_all_cpu_backtrace 0 > +#endif /* CONFIG_SMP */ > + > extern void bust_spinlocks(int yes); > extern int oops_in_progress; /* If set, an oops, panic(), BUG() or die() is in progress */ > extern int panic_timeout; > diff --git a/kernel/panic.c b/kernel/panic.c > index b69ee9e76cb2..73c340418575 100644 > --- a/kernel/panic.c > +++ b/kernel/panic.c > @@ -36,6 +36,24 @@ > #define PANIC_TIMER_STEP 100 > #define PANIC_BLINK_SPD 18 > > +#ifdef CONFIG_SMP > +/* > + * Should we dump all CPUs backtraces in an oops event? > + * Defaults to 0, can be changed either via cmdline or sysctl. > + */ > +unsigned int __read_mostly sysctl_oops_all_cpu_backtrace; > + > +static int __init oops_backtrace_setup(char *str) > +{ > + int rc = kstrtouint(str, 0, &sysctl_oops_all_cpu_backtrace); > + > + if (rc) > + return rc; > + return 1; > +} > +__setup("oops_all_cpu_backtrace=", oops_backtrace_setup); > +#endif /* CONFIG_SMP */ > + > int panic_on_oops = CONFIG_PANIC_ON_OOPS_VALUE; > static unsigned long tainted_mask = > IS_ENABLED(CONFIG_GCC_PLUGIN_RANDSTRUCT) ? (1 << TAINT_RANDSTRUCT) : 0; > @@ -515,6 +533,9 @@ void oops_enter(void) > /* can't trust the integrity of the kernel anymore: */ > debug_locks_off(); > do_oops_enter_exit(); > + > + if (sysctl_oops_all_cpu_backtrace) > + trigger_all_cpu_backtrace(); > } > > /* > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 238f268de486..1ac31d9d5b7e 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -813,6 +813,17 @@ static struct ctl_table kern_table[] = { > .proc_handler = proc_dointvec, > }, > #endif > +#ifdef CONFIG_SMP > + { > + .procname = "oops_all_cpu_backtrace", > + .data = &sysctl_oops_all_cpu_backtrace, > + .maxlen = sizeof(int), > + .mode = 0644, > + .proc_handler = proc_dointvec_minmax, > + .extra1 = SYSCTL_ZERO, > + .extra2 = SYSCTL_ONE, > + }, > +#endif /* CONFIG_SMP */ > { > .procname = "pid_max", > .data = &pid_max, > -- > 2.25.1 > -- Kees Cook