On Mon, Feb 19, 2018 at 12:22 PM, Sage Weil <sweil@xxxxxxxxxx> wrote: > What if we update the segv crash handler to, in addition to dumping the > recent log and stack trace to the log file, also > > - writes the same information to a standalone file, e.g. > /var/lib/ceph/crashes/$type.$id/$timestamp > - make the daemon check for previous crashes on startup, and report them > to the mgr > - make the mgr keep some record of previous crashes (if not the full log, > just the timestamp so we know when it happened) > - index/fingerprint by stack trace? > - surface a health warning for recent crashes? > - make an opt-in mgr function that works similar to python's sentry: post > the crash report to some central archive where developers will hear about > it. +1 There was a very useful project that I can't find anymore done by Google which would allow the segfault handler to create a coredump and save it to a file (via forking a helper process I believe). Relying on the operating system kernel core dump configuration sucks and it'd be nice to just do our own thing to collect coredumps persistently. Using that we could then even collect coredumps at the ceph-mgr for basic processing like generating a backtrace using a ceph daemon executable with debugging symbols. That backtrace would be significantly more useful when sending a crash report to the central archive. -- Patrick Donnelly -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html