Hello, gave it a try and bisected systemd … see below. Am Wed, Apr 10, 2024 at 12:37:39PM +0200 schrieb Alexander Dahl: > Hello everone, > > I thought I knew how to let the kernel create coredumps … (see below). > > Am Tue, Apr 09, 2024 at 04:21:21PM +0200 schrieb Alexander Dahl: > > Hello Lennart, > > > > thanks for your quick reply, see below. > > > > Am Tue, Apr 09, 2024 at 03:53:24PM +0200 schrieb Lennart Poettering: > > > On Di, 09.04.24 14:42, Alexander Dahl (ada@xxxxxxxxxxx) wrote: > > > > > > > Hello everyone, > > > > > > > > I'm currently trying to build a firmware for an embedded device and > > > > running into trouble because systemd seems to crash. The BSP is > > > > based on pengutronix DistroKit (master) built with ptxdist and the > > > > target is the Microchip SAM9X60-Curiosity board, which is arm v5te > > > > architecture (that board is not part of DistroKit, support for that is > > > > in an upper layer of mine not public yet (?)). > > > > > > > > Everything is quite recent, building systemd version 255.2 currently. > > > > On startup I get messages like this (this is the first one, later on > > > > there are lot more, all with the same status): > > > > > > > > [ 11.175650] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=11/SEGV > > > > [ 11.239679] systemd[1]: systemd-journald.service: Failed with result 'signal'. > > > > [ 11.292640] systemd[1]: Failed to start systemd-journald.service. > > > > [FAILED] Failed to start systemd-journald.service. > > > > See 'systemctl status systemd-journald.service' for details. > > > > > > > > The system drops me on a shell later, where I can run the above > > > > mentioned command, which gives: > > > > > > > > ~ # systemctl status systemd-journald.service > > > > x systemd-journald.service - Journal Service > > > > Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service; static) > > > > Active: failed (Result: signal) since Tue 2024-04-09 11:44:52 UTC; 11min a> > > > > TriggeredBy: x systemd-journald-dev-log.socket > > > > * systemd-journald-audit.socket > > > > x systemd-journald.socket > > > > Docs: man:systemd-journald.service(8) > > > > man:journald.conf(5) > > > > Process: 197 ExecStart=/usr/lib/systemd/systemd-journald (code=killed, sign> > > > > Main PID: 197 (code=killed, signal=SEGV) > > > > FD Store: 0 (limit: 4224) > > > > CPU: 330ms > > > > > > > > This does not help me much. Other services crashing: systemd-udevd > > > > and systemd-timesyncd, also with status 11/SEGV which is segmentation > > > > fault, right? > > > > > > Yes. > > > > > > > I had this board running with an older version of systemd, but I can > > > > not remember which was the last good version. > > > > > > > > Could anyone give me a hint please how to debug this? > > > > > > "coredumpctl gdb" should get open the most recent backtrace for you. > > > > This gives: > > > > ~ # coredumpctl gdb > > No journal files were found. > > No match found. > > > > > The coredump should also show up in the logs with a backtrace. > > > > I only have serial console output. journald is crashing. With > > `dmesg` I see systemd messages in kernel log, but no backtrace. > > gdbserver is installed on target, no gdb currently. Trying to get a > > coredump tomorrow. > > Well I tried for like three hours now, and I could not get a coredump > from journald nor udevd. Tried with the systemd way of creating > coredumps, which means /usr/lib/systemd/systemd-coredump is registered > as kernel.core_pattern with settings from > /usr/lib/sysctl.d/50-coredump.conf and I even made sure > systemd-coredump.socket is active (started from the maintenance > shell): > > ~ # systemctl status systemd-coredump.socket > * systemd-coredump.socket - Process Core Dump Socket > Loaded: loaded (/usr/lib/systemd/system/systemd-coredump.socket; static) > Active: active (listening) since Tue 2024-04-09 11:46:21 UTC; 2min 44s ago > Docs: man:systemd-coredump(8) > Listen: /run/systemd/coredump (SequentialPacket) > Accepted: 1; Connected: 0; > CGroup: /system.slice/systemd-coredump.socket > > Started a long running process like this: > > ~ # /usr/lib/systemd/systemd-udevd -d > Starting systemd-udevd version 255.2 > > Killed it like this: > > ~ # kill -11 269 > > Got this on serial console (kernel log): > > [ 329.282462] systemd[1]: Started systemd-coredump@1-271-0.service. > [ 329.454739] systemd[1]: Listening on systemd-journald-dev-log.socket. > [ 329.552429] systemd[1]: Starting systemd-journald.service... > [ 329.953204] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=11/SEGV > [ 329.966458] systemd[1]: systemd-journald.service: Failed with result 'signal'. > [ 329.977845] systemd[1]: Failed to start systemd-journald.service. > [ 330.002231] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 1. > [ 330.092289] systemd[1]: Starting systemd-journald.service... > [ 330.513858] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=11/SEGV > [ 330.533602] systemd[1]: systemd-journald.service: Failed with result 'signal'. > [ 330.544728] systemd[1]: Failed to start systemd-journald.service. > [ 330.570419] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 2. > [ 330.652355] systemd[1]: Starting systemd-journald.service... > [ 331.071540] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=11/SEGV > [ 331.084789] systemd[1]: systemd-journald.service: Failed with result 'signal'. > [ 331.104092] systemd[1]: Failed to start systemd-journald.service. > [ 331.121208] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 3. > [ 331.212357] systemd[1]: Starting systemd-journald.service... > [ 331.648631] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=11/SEGV > [ 331.662108] systemd[1]: systemd-journald.service: Failed with result 'signal'. > [ 331.673588] systemd[1]: Failed to start systemd-journald.service. > [ 331.702297] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 4. > [ 331.782346] systemd[1]: Starting systemd-journald.service... > [ 332.221658] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=11/SEGV > [ 332.244381] systemd[1]: systemd-journald.service: Failed with result 'signal'. > [ 332.255494] systemd[1]: Failed to start systemd-journald.service. > [ 332.265870] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 5. > [ 332.291243] systemd[1]: systemd-journald.service: Start request repeated too quickly. > [ 332.299240] systemd[1]: systemd-journald.service: Failed with result 'signal'. > [ 332.308392] systemd[1]: Failed to start systemd-journald.service. > [ 332.315720] systemd[1]: systemd-journald.socket: Failed with result 'service-start-limit-hit'. > [ 332.335020] systemd[1]: systemd-journald-dev-log.socket: Failed with result 'service-start-limit-hit'. > [ 332.838500] systemd[1]: systemd-coredump@1-271-0.service: Deactivated successfully. > [ 332.851322] systemd[1]: systemd-coredump@1-271-0.service: Consumed 1.004s CPU time. > > As you can see the systemd-coredump@1-271-0.service is started, and > systemd tries to start systemd-journald.service again and again, each > failing with segfault. However after that folder > /var/lib/systemd/coredump/ is completely empty. > > I tried with a different (simple) core pipe handler which works all > the time on a non systemd system: > > ~ # cat /usr/local/bin/pipecore.sh > #!/bin/sh > /usr/bin/xz -z -0 - > /mnt/data/tmp/cores/core-${1}-s${2}-${3}.xz > ~ # echo '|/usr/local/bin/pipecore.sh %e %s %t' > /proc/sys/kernel/core_pattern > > For stuff not started with systemctl this successfully creates > coredumps in my folder. Trying `systemctl start > systemd-journald.service` after that: nothing. Something is > preventing coredumps of systemd itself, and I have no idea what or > why. Read https://systemd.io/COREDUMP/ but it did not help me in this > case. > > Will try to go back to a working previous version now, and maybe > bisect later if my frustration got back to a normal level. Well, I still have no idea on that one, but bisecting systemd spit out this changeset, after that things were segfaulting: 7a114ed4b39e9670f6a511f3eecb6fd58274d27b is the first bad commit commit 7a114ed4b39e9670f6a511f3eecb6fd58274d27b Author: Topi Miettinen <toiwoton@xxxxxxxxx> Date: Sun Nov 6 21:12:45 2022 +0200 execute: use prctl(PR_SET_MDWE) for MemoryDenyWriteExecute=yes On some ARM platforms, the dynamic linker could use PROT_BTI memory protection flag with `mprotect(..., PROT_BTI | PROT_EXEC)` to enable additional memory protection for executable pages. But `MemoryDenyWriteExecute=yes` blocks this with seccomp filter denying all `mprotect(..., x | PROT_EXEC)`. Newly preferred method is to use prctl(PR_SET_MDWE) on supported kernels. Then in-kernel implementation can allow PROT_BTI as necessary, without weakening MDWE. In-kernel version may also be extended to more sophisticated protections in the future. man/systemd.exec.xml | 8 +++++--- src/basic/missing_prctl.h | 8 ++++++++ src/core/execute.c | 14 ++++++++++++++ 3 files changed, 27 insertions(+), 3 deletions(-) Had a look at that changeset, and I can not see anything which lets systemd crash in that function. With debug logging on, I see this in dmesg output: [ +0.004439] (journald)[90]: systemd-journald.service: Enabled MemoryDenyWriteExecute= with PR_SET_MDWE Looks like the success return path of the changed function. Maybe that setting now leads to crashing as a consequence of the enabled memory protection? Note: platform here is 32 bit arm, namely v5te on Microchip SAM9X60 SoC. Kernel is 6.6, maybe I did not get the kernelconfig right and some options are not set correctly? Or maybe those crashes are real? Then I could need some help how to _really_ enable coredumps for journald, udevd, and timesyncd. Got a hint off-list to pass 'systemd.dump_core=true' to kernel cmdline, but that had no effect on coredump creation. Greets Alex