Re: How to debug systemd services failing to start with 11/SEGV?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

gave it a try and bisected systemd … see below.

Am Wed, Apr 10, 2024 at 12:37:39PM +0200 schrieb Alexander Dahl:
> Hello everone,
> 
> I thought I knew how to let the kernel create coredumps … (see below).
> 
> Am Tue, Apr 09, 2024 at 04:21:21PM +0200 schrieb Alexander Dahl:
> > Hello Lennart,
> > 
> > thanks for your quick reply, see below.
> > 
> > Am Tue, Apr 09, 2024 at 03:53:24PM +0200 schrieb Lennart Poettering:
> > > On Di, 09.04.24 14:42, Alexander Dahl (ada@xxxxxxxxxxx) wrote:
> > > 
> > > > Hello everyone,
> > > >
> > > > I'm currently trying to build a firmware for an embedded device and
> > > > running into trouble because systemd seems to crash.  The BSP is
> > > > based on pengutronix DistroKit (master) built with ptxdist and the
> > > > target is the Microchip SAM9X60-Curiosity board, which is arm v5te
> > > > architecture (that board is not part of DistroKit, support for that is
> > > > in an upper layer of mine not public yet (?)).
> > > >
> > > > Everything is quite recent, building systemd version 255.2 currently.
> > > > On startup I get messages like this (this is the first one, later on
> > > > there are lot more, all with the same status):
> > > >
> > > >    [   11.175650] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=11/SEGV
> > > >    [   11.239679] systemd[1]: systemd-journald.service: Failed with result 'signal'.
> > > >    [   11.292640] systemd[1]: Failed to start systemd-journald.service.
> > > >    [FAILED] Failed to start systemd-journald.service.
> > > >    See 'systemctl status systemd-journald.service' for details.
> > > >
> > > > The system drops me on a shell later, where I can run the above
> > > > mentioned command, which gives:
> > > >
> > > >     ~ # systemctl status systemd-journald.service
> > > >     x systemd-journald.service - Journal Service
> > > >          Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service; static)
> > > >          Active: failed (Result: signal) since Tue 2024-04-09 11:44:52 UTC; 11min a>
> > > >     TriggeredBy: x systemd-journald-dev-log.socket
> > > >                  * systemd-journald-audit.socket
> > > >                  x systemd-journald.socket
> > > >            Docs: man:systemd-journald.service(8)
> > > >                  man:journald.conf(5)
> > > >         Process: 197 ExecStart=/usr/lib/systemd/systemd-journald (code=killed, sign>
> > > >        Main PID: 197 (code=killed, signal=SEGV)
> > > >        FD Store: 0 (limit: 4224)
> > > >             CPU: 330ms
> > > >
> > > > This does not help me much.  Other services crashing: systemd-udevd
> > > > and systemd-timesyncd, also with status 11/SEGV which is segmentation
> > > > fault, right?
> > > 
> > > Yes.
> > > 
> > > > I had this board running with an older version of systemd, but I can
> > > > not remember which was the last good version.
> > > >
> > > > Could anyone give me a hint please how to debug this?
> > > 
> > > "coredumpctl gdb" should get open the most recent backtrace for you.
> > 
> > This gives:
> > 
> >     ~ # coredumpctl gdb
> >     No journal files were found.
> >     No match found.
> > 
> > > The coredump should also show up in the logs with a backtrace.
> > 
> > I only have serial console output.  journald is crashing.  With
> > `dmesg` I see systemd messages in kernel log, but no backtrace.
> > gdbserver is installed on target, no gdb currently.  Trying to get a
> > coredump tomorrow.
> 
> Well I tried for like three hours now, and I could not get a coredump
> from journald nor udevd.  Tried with the systemd way of creating
> coredumps, which means /usr/lib/systemd/systemd-coredump is registered
> as kernel.core_pattern with settings from
> /usr/lib/sysctl.d/50-coredump.conf and I even made sure
> systemd-coredump.socket is active (started from the maintenance
> shell):
> 
>     ~ # systemctl status systemd-coredump.socket
>     * systemd-coredump.socket - Process Core Dump Socket
>          Loaded: loaded (/usr/lib/systemd/system/systemd-coredump.socket; static)
>          Active: active (listening) since Tue 2024-04-09 11:46:21 UTC; 2min 44s ago
>            Docs: man:systemd-coredump(8)
>          Listen: /run/systemd/coredump (SequentialPacket)
>        Accepted: 1; Connected: 0;
>          CGroup: /system.slice/systemd-coredump.socket
> 
> Started a long running process like this:
> 
>     ~ # /usr/lib/systemd/systemd-udevd -d
>     Starting systemd-udevd version 255.2
> 
> Killed it like this:
> 
>     ~ # kill -11 269
> 
> Got this on serial console (kernel log):
> 
>     [  329.282462] systemd[1]: Started systemd-coredump@1-271-0.service.
>     [  329.454739] systemd[1]: Listening on systemd-journald-dev-log.socket.
>     [  329.552429] systemd[1]: Starting systemd-journald.service...
>     [  329.953204] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=11/SEGV
>     [  329.966458] systemd[1]: systemd-journald.service: Failed with result 'signal'.
>     [  329.977845] systemd[1]: Failed to start systemd-journald.service.
>     [  330.002231] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 1.
>     [  330.092289] systemd[1]: Starting systemd-journald.service...
>     [  330.513858] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=11/SEGV
>     [  330.533602] systemd[1]: systemd-journald.service: Failed with result 'signal'.
>     [  330.544728] systemd[1]: Failed to start systemd-journald.service.
>     [  330.570419] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 2.
>     [  330.652355] systemd[1]: Starting systemd-journald.service...
>     [  331.071540] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=11/SEGV
>     [  331.084789] systemd[1]: systemd-journald.service: Failed with result 'signal'.
>     [  331.104092] systemd[1]: Failed to start systemd-journald.service.
>     [  331.121208] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 3.
>     [  331.212357] systemd[1]: Starting systemd-journald.service...
>     [  331.648631] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=11/SEGV
>     [  331.662108] systemd[1]: systemd-journald.service: Failed with result 'signal'.
>     [  331.673588] systemd[1]: Failed to start systemd-journald.service.
>     [  331.702297] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 4.
>     [  331.782346] systemd[1]: Starting systemd-journald.service...
>     [  332.221658] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=11/SEGV
>     [  332.244381] systemd[1]: systemd-journald.service: Failed with result 'signal'.
>     [  332.255494] systemd[1]: Failed to start systemd-journald.service.
>     [  332.265870] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 5.
>     [  332.291243] systemd[1]: systemd-journald.service: Start request repeated too quickly.
>     [  332.299240] systemd[1]: systemd-journald.service: Failed with result 'signal'.
>     [  332.308392] systemd[1]: Failed to start systemd-journald.service.
>     [  332.315720] systemd[1]: systemd-journald.socket: Failed with result 'service-start-limit-hit'.
>     [  332.335020] systemd[1]: systemd-journald-dev-log.socket: Failed with result 'service-start-limit-hit'.
>     [  332.838500] systemd[1]: systemd-coredump@1-271-0.service: Deactivated successfully.
>     [  332.851322] systemd[1]: systemd-coredump@1-271-0.service: Consumed 1.004s CPU time.
> 
> As you can see the systemd-coredump@1-271-0.service is started, and
> systemd tries to start systemd-journald.service again and again, each
> failing with segfault.  However after that folder
> /var/lib/systemd/coredump/ is completely empty.
> 
> I tried with a different (simple) core pipe handler which works all
> the time on a non systemd system:
> 
>     ~ # cat /usr/local/bin/pipecore.sh 
>     #!/bin/sh
>     /usr/bin/xz -z -0 - > /mnt/data/tmp/cores/core-${1}-s${2}-${3}.xz
>    ~ # echo '|/usr/local/bin/pipecore.sh %e %s %t' > /proc/sys/kernel/core_pattern
> 
> For stuff not started with systemctl this successfully creates
> coredumps in my folder.  Trying `systemctl start
> systemd-journald.service` after that: nothing.  Something is
> preventing coredumps of systemd itself, and I have no idea what or
> why.  Read https://systemd.io/COREDUMP/ but it did not help me in this
> case.
> 
> Will try to go back to a working previous version now, and maybe
> bisect later if my frustration got back to a normal level.

Well, I still have no idea on that one, but bisecting systemd spit out
this changeset, after that things were segfaulting:

    7a114ed4b39e9670f6a511f3eecb6fd58274d27b is the first bad commit
    commit 7a114ed4b39e9670f6a511f3eecb6fd58274d27b
    Author: Topi Miettinen <toiwoton@xxxxxxxxx>
    Date:   Sun Nov 6 21:12:45 2022 +0200
    
        execute: use prctl(PR_SET_MDWE) for MemoryDenyWriteExecute=yes
        
        On some ARM platforms, the dynamic linker could use PROT_BTI memory protection
        flag with `mprotect(..., PROT_BTI | PROT_EXEC)` to enable additional memory
        protection for executable pages. But `MemoryDenyWriteExecute=yes` blocks this
        with seccomp filter denying all `mprotect(..., x | PROT_EXEC)`.
        
        Newly preferred method is to use prctl(PR_SET_MDWE) on supported kernels. Then
        in-kernel implementation can allow PROT_BTI as necessary, without weakening
        MDWE. In-kernel version may also be extended to more sophisticated protections
        in the future.
    
     man/systemd.exec.xml      |  8 +++++---
     src/basic/missing_prctl.h |  8 ++++++++
     src/core/execute.c        | 14 ++++++++++++++
     3 files changed, 27 insertions(+), 3 deletions(-)

Had a look at that changeset, and I can not see anything which lets
systemd crash in that function.  With debug logging on, I see this in
dmesg output:

    [  +0.004439] (journald)[90]: systemd-journald.service: Enabled MemoryDenyWriteExecute= with PR_SET_MDWE

Looks like the success return path of the changed function.  Maybe
that setting now leads to crashing as a consequence of the enabled
memory protection?

Note: platform here is 32 bit arm, namely v5te on Microchip SAM9X60
SoC.  Kernel is 6.6, maybe I did not get the kernelconfig right and
some options are not set correctly?  Or maybe those crashes are real?
Then I could need some help how to _really_ enable coredumps for
journald, udevd, and timesyncd.  Got a hint off-list to pass
'systemd.dump_core=true' to kernel cmdline, but that had no effect on
coredump creation.

Greets
Alex




[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux