[PATCH v2] kernel/panic/kexec: fix "crash_kexec_post_notifiers" option issue in oops path

dzickus@xxxxxxxxxx (Don Zickus) · Mon, 23 Mar 2015 12:01:36 -0400

On Mon, Mar 23, 2015 at 10:31:58AM -0400, Vivek Goyal wrote:
> > > I think one of the motivations behind this patch was call to kmsg_dump().
> > > Some vendors have been wanting to have the capability to save kernel logs
> > > to some NVRAM before transition to second kernel happens. Their argument
> > > is that kdump does not succeed all the time and if kdump does not succeed
> > > then atleast they have something to work with (kernel logs retrieved
> > > from pstore interface).
> > 
> > Doesn't pstore attach itself to printk itself? AFAICS it does:
> > 
> >  fs/pstore/platform.c:   register_console(&pstore_console);
> > 
> > so the printk log leading up to and including the crash should be 
> > available, regardless of this patch. What am I missing?
> 
> That's a good point. I was not aware of it. I am Ccing Don Zickus as
> he has spent some time on this in the past.

Hi,

I will throw my two cents in here, though I expect Daisuke to provide better
info.

A number of years ago when I was helping work through some of the birthing
pains of the backend for pstore, we didn't have console support.  I don't
think it made sense for x86 either because:

- lack of space in nvram (for large logs)
- you could mark space for deletion, but space was only recovered on reboot
- the state machine would be slow to write (though mmap might have been
  faster)

Looking through the history of who introduced register_console, it looks
like it was the ARM folks.  They might have a better implementation for a
backend that does not have the above limitations.  I don't know.

> 
> Masami, would you have thougths on this? IIRC, one reason why kmsg_dump()
> was written so that one could dump kernel messages to an NVRAM. Of one
> could simple register pstore as console, then how kmsg_dump() will
> continue to be useful?
> 
> > 
> > > Not that I agree fully with this as problem might happen while we 
> > > try to run panic_notifiers or kmsg_dump hooks and never transition 
> > > into kdump kernel.
> > 
> > btw., this is the big problem with 'notifiers' in general: they are 
> > opaque with barely any semantics defined, and a source of constant 
> > confusion.
> 
> Agreed. That's the reason Eric never liked the idea of letting panic
> notifiers run before crash_kexec().
> 
> > 
> > > And it has been literally years since some developers have been 
> > > pushing for allowing to run panic notifiers before crash_kexec(). 
> > > Eric Biederman has been pushing back saying it reduces the 
> > > reliability of kdump operation so this is not acceptable.
> > 
> > So what do those notifiers do?

I think it was just philosophical differences.  kexec on panic is a
complicated path and a bunch of stuff could go wrong.  As insurance in case kexec
on panic did not work, some companies wanted to pre-maturely capture info,
so they have _something_ to use for a debug analysis.

Vivek, Matthew Garret and myself argued to provide us with failure cases and
we will fix kexec on panic instead.  I think the stability period for kexec
on panic was so long that companies still do not trust it.  Just my guess.

Vivek provided examples of what folks were doing with the notifiers.

> 
> IIRC, two main reasons had come in the past.
> 
> - In a cluster of nodes, people wanted to send some sort of notifications
>   to main server that a node has crashed and don't fence it off as it
>   might be saving dump.
> 
> - And saving kernel logs to non volatile store.
> 
> There might be more and I might not be aware about these. Hatayama and
> Masami, can you shed more light on this.

Cheers,
Don