Re: is the watchdog useful?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 22, 2019 at 12:34:45PM +0200, Umut Tezduyar Lindskog wrote:
> I am curious Zbigniew of how you find out if the coredump was on a starved
> process?

A very common case is systemd-journald which gets SIGABRT when in a
read() or write() or similar syscall. Another case is when
systemd-udevd workers get ABRT when doing open() on a device.

> This is common for our embedded devices. I didn't think it is common for
> desktop too.


> It is really useful for getting coredumps on deadlocked applications. For
> that reason I don't think it is good to remove this functionality
> completely.

Yes, I never suggested removing it completely. I'm just saying that for
the type of systems that Fedora targets, I don't recall any actual deadlock.
For more specialized systems, where the workload is more predictable,
it makes sense to have the watchdog.

There might be cases where the kernel is dead-locked internally, and e.g.
open() or modprobe() never returns. For those cases it might be useful to
get the backtrace, but actually killing the process and/or storing the
coredump is useful.

Zbyszek

> 
> Umut
> 
> On Mon, Oct 21, 2019 at 7:51 PM Zbigniew Jędrzejewski-Szmek <
> zbyszek@xxxxxxxxx> wrote:
> 
> > In principle, the watchdog for services is nice. But in practice it seems
> > be bring only grief. The Fedora bugtracker is full of automated reports of
> > ABRTs,
> > and of those that were fired by the watchdog, pretty much 100% are bogus,
> > in
> > the sense that the machine was resource starved and the watchdog fired.
> >
> > There a few downsides to the watchdog killing the service:
> > 1. if it is something like logind, it is possible that it will cause
> > user-visible
> > failure of other services
> > 2. restarting of the service causes additional load on the machine
> > 3. coredump handling causes additional load on the machine, quite
> > significant
> > 4. those failures are reported in bugtrackers and waste everyone's time.
> >
> > I had the following ideas:
> > 1. disable coredumps for watchdog abrts: systemd could set some flag
> > on the unit or otherwise notify systemd-coredump about this, and it could
> > just
> > log the occurence but not dump the core file.
> > 2. generally disable watchdogs and make them opt in. We have
> > 'systemd-analyze service-watchdogs',
> > and we could make the default configurable to "yes|no".
> >
> > What do you think?
> > Zbyszek
> > _______________________________________________
> > systemd-devel mailing list
> > systemd-devel@xxxxxxxxxxxxxxxxxxxxx
> > https://lists.freedesktop.org/mailman/listinfo/systemd-devel
_______________________________________________
systemd-devel mailing list
systemd-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/systemd-devel




[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux