Re: Detecting Systemd crash

František Šumšal <frantisek@xxxxxxxxx> · Mon, 5 Feb 2024 12:43:14 +0100

On 2/3/24 16:55, Álvaro Cebrián Juan wrote:
Great question!

I am very interested in detecting systemd crashes too since I have experienced them recently and have been asked to come up with a solution to react when a PID1 crash happens.
In fact, in my recent experiences, a journald crash was enough to render the system into an unreliable/degraded state in which some top-level applications worked while others didn't.

So adding to David's 1st question, I need to detect systemd and journald crashes and then trigger a `systemctl reboot --force --force` command

You can tell systemd to do just that, by setting CrashReboot=yes in system.conf [0][1]. It defaults to 'no' to avoid reboot loops.

[0] https://www.freedesktop.org/software/systemd/man/latest/systemd-system.conf.html#LogColor=
[1] https://www.freedesktop.org/software/systemd/man/latest/systemd.html#systemd.crash_reboot

I have also read that Linux Magic System Request Key (SysRq) can help in such scenarios but I don't know how they work.

Any help would be very appreciated.
Thank you.

Some related links:
https://news.ycombinator.com/item?id=19023695 <https://news.ycombinator.com/item?id=19023695>
https://news.ycombinator.com/item?id=36873927 <https://news.ycombinator.com/item?id=36873927>
https://www.kernel.org/doc/html/latest/admin-guide/sysrq.html <https://www.kernel.org/doc/html/latest/admin-guide/sysrq.html>

El sáb, 3 feb 2024 a las 16:14, David Timber (<dxdt@xxxxxxxxxxxx <mailto:dxdt@xxxxxxxxxxxx>>) escribió:

    Systemd crashed on me the other day. I was writing up some Systemd units
    and testing them out by daemon-reload every time I wanted to test them
    out. Not the best way to go on about, I know. My bad abusing Systemd to
    the point of crashing. Perhaps it was just a bit flip that caused this.

         systemd[2368]: Assertion 'path_is_absolute(p)' failed at
         src/basic/chase.c:628, function chase(). Aborting.
         systemd[1]: Assertion 'path_is_absolute(p)' failed at
         src/basic/chase.c:628, function chase(). Aborting.
         systemd[1]: Caught <ABRT> from our own process.
         systemd-coredump[32497]: Due to PID 1 having crashed coredump
         collection will now be turned off.
         systemd-coredump[32497]: [🡕] Process 32496 (systemd) of user 0
         dumped core.
         systemd[1]: Caught <ABRT>, dumped core as pid 32496.
         systemd[1]: Freezing execution.

         ...

         systemd-journald[871]: Failed to send stream file descriptor to
         service manager: Transport endpoint is not connected

    I didn't even bother trying producing stack trace. I can get on that if
    anyone wants it. My machine started doing some weird things like Firefox
    not being able to do Ajax properly whilst being able to go to a new
    page, Chromium not being able to create a new tab whilst all the text
    editors worked just fine, all the systemctl commands timing out. So
    basically, I was using Linux without fork(). Anyway.
    Well, I think any software can crash for any reason whatsoever. The
    problem with Systemd I realised from this incident is that I had no way
    of knowing that Systemd had crashed until I opened up the journal and
    kernel logs and saw that Systemd had crashed some time ago. In this
    particular incident, Systemd caught the signal and decided to just
    freeze. No idea why you'd want that because if it had just crashed, the
    kernel would have just panicked and I would have realised something went
    wrong.

    1: So I decided that I need a some sort of "watchdog" that warns me when
    something like this happens. Using dbus to poll the status of the
    Systemd process, it could be a GUI app running under a seat, just a
    daemon that writes a warning message using `wall` or just send mail
    using a primed up MUA process. I wonder if someone already had the same
    idea and went on to make one.

    2: How do I get Systemd to freeze to test such program? I mean, if I
    kill Systemd, the kernel would crash so I have to somehow tell Systemd
    to freeze?