>>> Lennart Poettering <lennart@xxxxxxxxxxxxxx> schrieb am 31.07.2020 um 21:57 in Nachricht <20200731195737.GA213997@gardel-login>: > On Fr, 31.07.20 11:52, Ulrich Windl (Ulrich.Windl@xxxxxxxxxxxxxxxxxxxx) > wrote: > >> Hi! >> >> For systemd-228-157.12.5.x86_64 (SLES12 SP5) I had an odd status: >> Three services were re-started in parallel, and one of those failed (stop >> routine killed itself with signal 6 due to a timeout): >> >> Now the status systemctl displays looks like this (some details omitted): >> ● iotwatch@NFS1.service - iotwatch I/O performance monitor instance "NFS1" >> Loaded: loaded (/etc/iotwatch.conf; enabled; vendor preset: disabled) >> Active: active (running) since Fri 2020-07-31 11:31:57 CEST; 6min ago >> ... >> Main PID: 12246 (iotwatch-NFS1) >> Tasks: 4 (limit: 512) >> CGroup: /system.slice/system-iotwatch.slice/iotwatch@NFS1.service >> └─12246 iotwatch-NFS1 -l /var/log/iotwatch/NFS1/iotwatch-NFS1.log >> ... >> >> Jul 31 11:31:57 v04 systemd[1]: Starting iotwatch I/O performance moni..... >> Jul 31 11:31:57 v04 systemd[1]: Started iotwatch I/O performance monit...". >> Jul 31 11:31:59 v04 systemd-coredump[12190]: Process 7123 (iotwatch-NFS1... >> >> Stack trace of thread > 7123: >> Hint: Some lines were ellipsized, use -l to show in full. >> >> Now the odd thing is the sequence of journal entries: >> The service started and then dumped core, but still is active? >> >> Actually what happened was: A client told the server process to stop, but > when >> it didn't in time, it killed it with signal 6. So the old process stopped. >> THEN the new process was started (PID 12246). >> >> Is the ordering due to the fact that coredump completed after the new > service >> started? Or is it just systemd's archiving of the core dump file? I thought > the >> pid is visible until core-dump had completed.... > > When a coredump happens we'll collect it and then process it, > i.e. compress it, generate stacktrace of it, and finally when we are > done with all that report it to the journal. Hence it might be that > the coredump processing finishes long after the service entered > another cycle. In fact, we nice the coredump processing to nice level > 9 since doing the stacktrace analysis, the compression and writing > everyting to disk might be quite CPU and IO intensive, hence we > priorize it down in order not to affect scheduling of everything else > as negatively. This however means that if CPU/IO is scarce the > coredump processing might be delayed quite a bit. Hi! OK, some additional notes (maybe these issues are fixed in a newer version already): Looking in my /var/log/messages, I see higher resolution of time stamps, and specifically I see the original core-dump message: 2020-07-31T11:31:57.404198+02:00 v04 systemd[1]: iotwatch@NFS1.service: Main process exited, code=dumped, status=6/ABRT 2020-07-31T11:31:57.421639+02:00 v04 systemd[1]: Stopped iotwatch I/O performance monitor instance "NFS1". (Using "journalctl I see lower-resolution timestamps as output) Jul 31 11:31:57 v04 systemd[1]: iotwatch@NFS1.service: Main process exited, code=dumped, status=6/ABRT Jul 31 11:31:57 v04 systemd[1]: Stopped iotwatch I/O performance monitor instance "NFS1". Jul 31 11:31:57 v04 systemd[1]: iotwatch@NFS1.service: Unit entered failed state. Jul 31 11:31:57 v04 systemd[1]: iotwatch@NFS1.service: Failed with result 'core-dump'. So why is the "core dump complete" included in the status of "starting" the service when the core dump happened when stopping the service, and why isn't the original core dump message included as well? Regards, Ulrich > > Lennart > > -- > Lennart Poettering, Berlin _______________________________________________ systemd-devel mailing list systemd-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/systemd-devel