Re: systemctl hangs with 249.7 systemd in yocto Honister release

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Michal,

Actually we have upgraded systemd version to 250.5, but the issue will still happen.

Navigating the journal log context of when the error message is first printed, I found there is a SEGV fault of systemd-udevd:

Jan 04 16:10:40 ali2600 systemd[1]: Created slice Slice /system/systemd-coredump. Jan 04 16:10:40 ali2600 systemd[1]: Started Process Core Dump (PID 7507/UID 0). Jan 04 16:10:42 ali2600 systemd-coredump[7508]: elfutils disabled, parsing ELF objects not supported Jan 04 16:10:42 ali2600 systemd-coredump[7508]: [LNK] Process 173 (systemd-udevd) of user 0 dumped core. Jan 04 16:10:42 ali2600 systemd[1]: systemd-udevd.service: Main process exited, code=dumped, status=11/SEGV Jan 04 16:10:42 ali2600 systemd[1]: systemd-udevd.service: Killing process 7503 (systemd-udevd) with signal SIGKILL. Jan 04 16:10:42 ali2600 systemd[1]: systemd-udevd.service: Killing process 7503 (systemd-udevd) with signal SIGKILL. Jan 04 16:10:42 ali2600 systemd[1]: systemd-udevd.service: Failed with result 'core-dump'. Jan 04 16:10:42 ali2600 systemd[1]: systemd-udevd.service: Scheduled restart job, restart counter is at 1. Jan 04 16:10:42 ali2600 systemd[1]: Stopped Rule-based Manager for Device Events and Files. Jan 04 16:10:42 ali2600 systemd[1]: Starting Rule-based Manager for Device Events and Files... Jan 04 16:10:42 ali2600 systemd[1]: systemd-coredump@0-7507-0.service: Deactivated successfully.
Jan 04 16:10:42 ali2600 systemd-udevd[7510]: corrupted size vs. prev_size

......

Jan 04 16:10:57 ali2600 systemd-coredump[7517]: elfutils disabled, parsing ELF objects not supported Jan 04 16:10:57 ali2600 systemd-coredump[7517]: [LNK] Process 7516 (systemd) of user 0 dumped core. Jan 04 16:10:57 ali2600 phosphor-dump-manager[356]: *** stack smashing detected ***: terminated Jan 04 16:10:57 ali2600 phosphor-dump-monitor[280]: Failed to create dump: sd_bus_call noreply: org.freedesktop.DBus.Error.NoReply: Remote peer disconnected
Jan 04 16:10:57 ali2600 systemd[1]: Caught <SEGV>, dumped core as pid 7516.
Jan 04 16:10:57 ali2600 systemd[1]: Freezing execution.
Jan 04 16:10:57 ali2600 phosphor-dump-manager[7536]: Failed to list units: Transport endpoint is not connected

Is it the reason for systemctl fails to work? For the log says "systemd freezing execution".

Thanks,

Heyi


On 2023/1/4 下午6:48, Michal Koutný wrote:
On Wed, Jan 04, 2023 at 04:51:22PM +0800, Heyi Guo <guoheyi@xxxxxxxxxxxxxxxxx> wrote:
The issue happened again, but the /proc/1/stack and
/proc/$pid_of_dbus-broker/stack are both empty on our platform.
(You reported previously the version was v249 (which is behind the last
two upstream versions, so it may be a good idea to raise the issue with
your distro.))

I checked kernel config and confirmed that  CONFIG_STACKTRACE is enabled:

zcat /proc/config.gz | grep CONFIG_STACKTRACE
CONFIG_STACKTRACE_SUPPORT=y
# CONFIG_STACKTRACE_BUILD_ID is not set
CONFIG_STACKTRACE=y

Is there any other config that is missing?
I don't think so (the file wouldn't be present otherwise).

If there are no kernel stacks, the tasks execute in userspace and given
the indefinite stuckage, they're likely looping somewhere (or you must
have been unlucky to miss a syscall), which should manifest in their CPU
consumption.

The userspace stack may be of interest then, e.g.
`gdb -ex "bt" --batch -p 1`

(for PID 1 and debuginfo for involved binaries must be present to obtain
useful info).

Michal



[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux