On Di, 11.03.25 11:46, Windl, Ulrich (u.windl@xxxxxx) wrote: > Hi! > > A SLES15 SP6 machine running in VMware recently showed severe I/O hangs (which seem to be related to Veam backup software making snapshots when also VMware snapshots of the VM exist). > The point was that even direct reads were hanging for about three minutes until the kernel logged a "kernel: sd 0:0:1:0: [sdb] tag#801 task abort on host 0, 00000000aade996c". > So most likely the read would not provide any data while the write would not have stored any. > > In that context I noticed journald dumping core like this: > > Feb 25 08:03:30 v04 kernel: sd 0:0:1:0: [sdb] tag#217 task abort on host 0, 000000004f9d9a0f > Feb 25 08:03:30 v04 systemd[1]: Finished User Runtime Directory /run/user/0. > Feb 25 08:03:30 v04 systemd[1]: Starting User Manager for UID 0... > Feb 25 08:03:30 v04 systemd-coredump[24229]: Process 747 (systemd-journal) of user 0 dumped core. > Feb 25 08:03:30 v04 systemd-coredump[24229]: Coredump diverted to /var/lib/systemd/coredump/core.systemd-journal.0.54731128d84044c8922ec7e1e329e024.747.1740467> > Feb 25 08:03:30 v04 systemd-coredump[24229]: Stack trace of thread 747: > Feb 25 08:03:30 v04 systemd-coredump[24229]: #0 0x00007fe837923f3a fsync (libc.so.6 + 0x123f3a) > Feb 25 08:03:30 v04 systemd-coredump[24229]: #1 0x00007fe837e5bda3 n/a (libsystemd-shared-254.so + 0x25bda3) > Feb 25 08:03:30 v04 systemd-coredump[24229]: #2 0x00007fe837e5ead5 journal_file_append_object (libsystemd-shared-254.so + 0x25ead5) > Feb 25 08:03:30 v04 systemd-coredump[24229]: #3 0x00007fe837e63c3e n/a (libsystemd-shared-254.so + 0x263c3e) > Feb 25 08:03:30 v04 systemd-coredump[24229]: #4 0x00007fe837e64675 journal_file_append_entry (libsystemd-shared-254.so + 0x264675) > Feb 25 08:03:30 v04 systemd-coredump[24229]: #5 0x00005582df343506 n/a (systemd-journald + 0x10506) > Feb 25 08:03:30 v04 systemd-coredump[24229]: #6 0x00005582df355306 n/a (systemd-journald + 0x22306) > Feb 25 08:03:30 v04 systemd-coredump[24229]: #7 0x00005582df34711c n/a (systemd-journald + 0x1411c) > Feb 25 08:03:30 v04 systemd-coredump[24229]: #8 0x00007fe837e88674 n/a (libsystemd-shared-254.so + 0x288674) > Feb 25 08:03:30 v04 systemd-coredump[24229]: #9 0x00007fe837e88941 sd_event_dispatch (libsystemd-shared-254.so + 0x288941) > Feb 25 08:03:30 v04 systemd-coredump[24229]: #10 0x00007fe837e89208 sd_event_run (libsystemd-shared-254.so + 0x289208) > Feb 25 08:03:30 v04 systemd-coredump[24229]: #11 0x00005582df33b98d n/a (systemd-journald + 0x898d) > Feb 25 08:03:30 v04 systemd-coredump[24229]: #12 0x00007fe837840e6c __libc_start_call_main (libc.so.6 + 0x40e6c) > Feb 25 08:03:30 v04 systemd-coredump[24229]: #13 0x00007fe837840f35 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x40f35) > Feb 25 08:03:30 v04 systemd-coredump[24229]: #14 0x00005582df33bbe1 n/a (systemd-journald + 0x8be1) > Feb 25 08:03:30 v04 systemd-coredump[24229]: ELF object binary architecture: AMD x86-64 > Feb 25 08:03:30 v04 systemd[1]: systemd-journald.service: Main process exited, code=dumped, status=6/ABRT > Feb 25 08:03:30 v04 systemd[1]: systemd-journald.service: Failed with result 'watchdog'. > Feb 25 08:03:30 v04 systemd[1]: systemd-journald.service: Consumed 2.973s CPU time. > Feb 25 08:03:30 v04 systemd[1]: Started User Manager for UID 0. > Feb 25 08:03:30 v04 systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 2. > > The point is: Is it expected that journald aborts this way, or is it > considered to be a bug? Version was > systemd-254.23-150600.4.25.1.x86_64 It's almost certainly triggered by WatchdogSec=. i.e. if services hang for too long, we might restart them automatically to try to make things better. This is of course ignorant of the question whether this is a system wide hang or a specific hang on that service though. See https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#WatchdogSec= for details. Lennart -- Lennart Poettering, Berlin