On Mon, Jan 09, 2023 at 11:04:11AM +0100, Lennart Poettering wrote: > On Fr, 06.01.23 11:06, Michael Catanzaro (mcatanzaro@xxxxxxxxxx) wrote: > > > Maybe instead of SIGKILL, we should send SIGQUIT instead. That way abrt > > should complain next time you boot and users will have an opportunity to > > report bugs to the package maintainer, instead of the problem being forever > > ignored. Killing things silently makes it real hard to report bugs. And as a > > bonus, the core dump should actually show what the process was doing at the > > time it got killed. The more I think about it, the better this sounds. > > Currently this can be configured using FinalKillSignal=SIGQUIT, so we'd just > > need to figure out the right place to put that. > > > systemd already has a configuration option for this so we'd just have to > > turn it on. > > Don't use FinalKillSignal=SIGQUIT. > > Use TimeoutStopFailureMode=abort instead. (which covers more ground, > and sends SIGABRT rather than SIGQUIT on failure, which has the same > effect: coredumping). I guess we could add DefaultTimeoutStopFailureMode= setting and a -Ddefault-default-timeout-stop-failure-mode= compile-time default for it. Barring that, it's possible to do a per-type drop-ins: /usr/lib/systemd/system/{service,scope,mount}.d/10-kill-mode.conf or so, maybe for more types. But that'd be harder to override and more messy in general. > That said: dumping core is potentially extremely expensive (web > browsers have gigabytes of virtual memory that we might end up > processing and compressing). Quite often the stuff that is slow when > exiting is also the stuff that is expensive to dump. > > Hence, I am not sure you'll gain that much via this mechanism: you cut > a long operation short and then execute long operation as result. You > might end delaying things more than you hope shortening them. That is true, but I don't think that it's an actual reason to not do this. The job for the coredump gets a separate timeout, so the coredump would generally run successfully during shutdown. It'll obviously delay the shutdown, making the whole thing even more painful. I assume that we would treat any such cases as bugs. If we get the coredumps reported though abrt, it'd indeed make it easier to diagnose those cases. -- Digging into some details: It seems that coredumping usually takes a few seconds at most, even with gigabytes of RSS. I won't cite specific numbers, since that's just a very biased sample on my laptop gathered via journalctl --grep 'systemd-coredump@.*: Consumed' If the default stop timeout is set to 15s, we would probably have to raise the timeout for the systemd-coredump@.service to something higher. This would let the coredump process run successfully in most cases. Zbyszek _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue