Re: journald intended to dump core on write error?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Di, 11.03.25 11:46, Windl, Ulrich (u.windl@xxxxxx) wrote:

> Hi!
>
> A SLES15 SP6 machine running in VMware recently showed severe I/O hangs (which seem to be related to Veam backup software making snapshots when also VMware snapshots of the VM exist).
> The point was that even direct reads were hanging for about three minutes until the kernel logged a "kernel: sd 0:0:1:0: [sdb] tag#801 task abort on host 0, 00000000aade996c".
> So most likely the read would not provide any data while the write would not have stored any.
>
> In that context I noticed journald dumping core like this:
>
> Feb 25 08:03:30 v04 kernel: sd 0:0:1:0: [sdb] tag#217 task abort on host 0, 000000004f9d9a0f
> Feb 25 08:03:30 v04 systemd[1]: Finished User Runtime Directory /run/user/0.
> Feb 25 08:03:30 v04 systemd[1]: Starting User Manager for UID 0...
> Feb 25 08:03:30 v04 systemd-coredump[24229]: Process 747 (systemd-journal) of user 0 dumped core.
> Feb 25 08:03:30 v04 systemd-coredump[24229]: Coredump diverted to /var/lib/systemd/coredump/core.systemd-journal.0.54731128d84044c8922ec7e1e329e024.747.1740467>
> Feb 25 08:03:30 v04 systemd-coredump[24229]: Stack trace of thread 747:
> Feb 25 08:03:30 v04 systemd-coredump[24229]: #0  0x00007fe837923f3a fsync (libc.so.6 + 0x123f3a)
> Feb 25 08:03:30 v04 systemd-coredump[24229]: #1  0x00007fe837e5bda3 n/a (libsystemd-shared-254.so + 0x25bda3)
> Feb 25 08:03:30 v04 systemd-coredump[24229]: #2  0x00007fe837e5ead5 journal_file_append_object (libsystemd-shared-254.so + 0x25ead5)
> Feb 25 08:03:30 v04 systemd-coredump[24229]: #3  0x00007fe837e63c3e n/a (libsystemd-shared-254.so + 0x263c3e)
> Feb 25 08:03:30 v04 systemd-coredump[24229]: #4  0x00007fe837e64675 journal_file_append_entry (libsystemd-shared-254.so + 0x264675)
> Feb 25 08:03:30 v04 systemd-coredump[24229]: #5  0x00005582df343506 n/a (systemd-journald + 0x10506)
> Feb 25 08:03:30 v04 systemd-coredump[24229]: #6  0x00005582df355306 n/a (systemd-journald + 0x22306)
> Feb 25 08:03:30 v04 systemd-coredump[24229]: #7  0x00005582df34711c n/a (systemd-journald + 0x1411c)
> Feb 25 08:03:30 v04 systemd-coredump[24229]: #8  0x00007fe837e88674 n/a (libsystemd-shared-254.so + 0x288674)
> Feb 25 08:03:30 v04 systemd-coredump[24229]: #9  0x00007fe837e88941 sd_event_dispatch (libsystemd-shared-254.so + 0x288941)
> Feb 25 08:03:30 v04 systemd-coredump[24229]: #10 0x00007fe837e89208 sd_event_run (libsystemd-shared-254.so + 0x289208)
> Feb 25 08:03:30 v04 systemd-coredump[24229]: #11 0x00005582df33b98d n/a (systemd-journald + 0x898d)
> Feb 25 08:03:30 v04 systemd-coredump[24229]: #12 0x00007fe837840e6c __libc_start_call_main (libc.so.6 + 0x40e6c)
> Feb 25 08:03:30 v04 systemd-coredump[24229]: #13 0x00007fe837840f35 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x40f35)
> Feb 25 08:03:30 v04 systemd-coredump[24229]: #14 0x00005582df33bbe1 n/a (systemd-journald + 0x8be1)
> Feb 25 08:03:30 v04 systemd-coredump[24229]: ELF object binary architecture: AMD x86-64
> Feb 25 08:03:30 v04 systemd[1]: systemd-journald.service: Main process exited, code=dumped, status=6/ABRT
> Feb 25 08:03:30 v04 systemd[1]: systemd-journald.service: Failed with result 'watchdog'.
> Feb 25 08:03:30 v04 systemd[1]: systemd-journald.service: Consumed 2.973s CPU time.
> Feb 25 08:03:30 v04 systemd[1]: Started User Manager for UID 0.
> Feb 25 08:03:30 v04 systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 2.
>
> The point is: Is it expected that journald aborts this way, or is it
> considered to be a bug? Version was
> systemd-254.23-150600.4.25.1.x86_64

It's almost certainly triggered by WatchdogSec=. i.e. if services hang
for too long, we might restart them automatically to try to make
things better. This is of course ignorant of the question whether this
is a system wide hang or a specific hang on that service though.

See
https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#WatchdogSec=
for details.

Lennart

--
Lennart Poettering, Berlin



[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux