Re: capturing crash dumps

Patrick Donnelly <pdonnell@xxxxxxxxxx> · Mon, 19 Feb 2018 12:52:45 -0800

On Mon, Feb 19, 2018 at 12:22 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> What if we update the segv crash handler to, in addition to dumping the
> recent log and stack trace to the log file, also
>
>  - writes the same information to a standalone file, e.g.
>     /var/lib/ceph/crashes/$type.$id/$timestamp
>  - make the daemon check for previous crashes on startup, and report them
> to the mgr
>  - make the mgr keep some record of previous crashes (if not the full log,
> just the timestamp so we know when it happened)
>     - index/fingerprint by stack trace?
>  - surface a health warning for recent crashes?
>  - make an opt-in mgr function that works similar to python's sentry: post
> the crash report to some central archive where developers will hear about
> it.

+1

There was a very useful project that I can't find anymore done by
Google which would allow the segfault handler to create a coredump and
save it to a file (via forking a helper process I believe). Relying on
the operating system kernel core dump configuration sucks and it'd be
nice to just do our own thing to collect coredumps persistently.

Using that we could then even collect coredumps at the ceph-mgr for
basic processing like generating a backtrace using a ceph daemon
executable with debugging symbols. That backtrace would be
significantly more useful when sending a crash report to the central
archive.

-- 
Patrick Donnelly
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html