Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

Zbigniew Jędrzejewski-Szmek <zbyszek@xxxxxxxxx> · Mon, 9 Jan 2023 20:35:30 +0000

On Mon, Jan 09, 2023 at 11:04:11AM +0100, Lennart Poettering wrote:
> On Fr, 06.01.23 11:06, Michael Catanzaro (mcatanzaro@xxxxxxxxxx) wrote:
> 
> > Maybe instead of SIGKILL, we should send SIGQUIT instead. That way abrt
> > should complain next time you boot and users will have an opportunity to
> > report bugs to the package maintainer, instead of the problem being forever
> > ignored. Killing things silently makes it real hard to report bugs. And as a
> > bonus, the core dump should actually show what the process was doing at the
> > time it got killed. The more I think about it, the better this sounds.
> > Currently this can be configured using FinalKillSignal=SIGQUIT, so we'd just
> > need to figure out the right place to put that.
>
> > systemd already has a configuration option for this so we'd just have to
> > turn it on.
> 
> Don't use FinalKillSignal=SIGQUIT.
> 
> Use TimeoutStopFailureMode=abort instead. (which covers more ground,
> and sends SIGABRT rather than SIGQUIT on failure, which has the same
> effect: coredumping).

I guess we could add DefaultTimeoutStopFailureMode= setting and a
-Ddefault-default-timeout-stop-failure-mode= compile-time default for it.

Barring that, it's possible to do a per-type drop-ins:
/usr/lib/systemd/system/{service,scope,mount}.d/10-kill-mode.conf
or so, maybe for more types. But that'd be harder to override and more
messy in general.

> That said: dumping core is potentially extremely expensive (web
> browsers have gigabytes of virtual memory that we might end up
> processing and compressing). Quite often the stuff that is slow when
> exiting is also the stuff that is expensive to dump.
> 
> Hence, I am not sure you'll gain that much via this mechanism: you cut
> a long operation short and then execute long operation as result. You
> might end delaying things more than you hope shortening them.

That is true, but I don't think that it's an actual reason to not do this. The
job for the coredump gets a separate timeout, so the coredump would generally
run successfully during shutdown.

It'll obviously delay the shutdown, making the whole thing even more painful.
I assume that we would treat any such cases as bugs. If we get the coredumps
reported though abrt, it'd indeed make it easier to diagnose those cases.

--

Digging into some details:

It seems that coredumping usually takes a few seconds at most, even with
gigabytes of RSS. I won't cite specific numbers, since that's just a very
biased sample on my laptop gathered via
  journalctl --grep 'systemd-coredump@.*: Consumed'

If the default stop timeout is set to 15s, we would probably have to raise the
timeout for the systemd-coredump@.service to something higher. This would let
the coredump process run successfully in most cases.

Zbyszek
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue