Re: Odd behaviour on boot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 7, 2022, 19:43 Wols Lists <antlists@xxxxxxxxxxxxxxx> wrote:
On 07/02/2022 14:06, Mantas Mikulėnas wrote:
> On Mon, Feb 7, 2022 at 2:54 PM Wols Lists <antlists@xxxxxxxxxxxxxxx
> <mailto:antlists@xxxxxxxxxxxxxxx>> wrote:
>
>     Bear in mind I did have a malformed scarletdme.service file, it was
>     missing "Type=forking", but it shouldn't be bringing down unrelated
>     services, should it?
>
>     This is the output from dovecot, which clearly failed to start on
>     boot...
>
>     thewolery /dev # systemctl status dovecot
>     × dovecot.service - Dovecot IMAP/POP3 email server
>            Loaded: loaded (/lib/systemd/system/dovecot.service; enabled;
>     vendor preset: disabled)
>            Active: failed (Result: exit-code) since Mon 2022-02-07 07:55:11
>     GMT; 3min 53s ago
>              Docs: man:dovecot(1)
>     https://doc.dovecot.org/ <https://doc.dovecot.org/>
>           Process: 1511 ExecStart=/usr/sbin/dovecot -F (code=killed,
>     signal=TERM)
>           Process: 1529 ExecStop=/usr/bin/doveadm stop (code=exited,
>     status=75)
>          Main PID: 1511 (code=killed, signal=TERM)
>               CPU: 22ms
>
>     Feb 07 07:55:09 thewolery systemd[1]: Started Dovecot IMAP/POP3 email
>     server.
>     Feb 07 07:55:11 thewolery doveadm[1529]: Fatal: Dovecot is not running
>     (read from /run/dovecot/ma>
>     Feb 07 07:55:11 thewolery systemd[1]: dovecot.service: Control process
>     exited, code=exited, statu>
>     Feb 07 07:55:11 thewolery systemd[1]: dovecot.service: Failed with
>     result 'exit-code'.
>     thewolery /dev # systemctl restart dovecot
>
>     But both samba and sshd failed similarly. I have vague recollections
>     somewhere of seeing a reference to qm in either the sshd or samba
>     output
>     pre-restart, but don't know how to get back to it. (qm is the program
>     started by scarletdme.service.)
>
>     Any ideas, anybody suspect it might be a bug in systemd? I've fixed the
>     scarletdme.service, but it's a bit weird that it's the first time I
>     booted with the broken .service, and three (at least) other services
>     failed. Although my system does seem to have stability problems, so I
>     don't know for certain where to place any blame.
>
>
> Have you checked the whole `journalctl -b` for messages that happened
> around the actual failure? Could be just about anything, from services
> getting stopped due to their dependencies failing, to something missing
> due to *lack of* dependencies, to OOM killing random processes...
>
eb 07 07:55:09 thewolery systemd[1]: Reached target Socket Units.
Feb 07 07:55:09 thewolery systemd[1]: Reached target Basic System.
Feb 07 07:55:09 thewolery systemd[1]: Condition check resulted in
Save/Restore Sound Card State being skipped.
Feb 07 07:55:09 thewolery systemd[1]: Condition check resulted in Manage
Sound Card State (restore and store) being skipped.
Feb 07 07:55:09 thewolery systemd[1]: Reached target Sound Card.
Feb 07 07:55:09 thewolery systemd[1]: Started D-Bus System Message Bus.
Feb 07 07:55:09 thewolery systemd[1]: Started Dovecot IMAP/POP3 email
server.
Feb 07 07:55:09 thewolery systemd[1]: Starting Postfix Mail Transport
Agent...
Feb 07 07:55:09 thewolery systemd[1]: Started ScarletDME service start.
Feb 07 07:55:09 thewolery systemd[1]: Starting Samba SMB Daemon...
Feb 07 07:55:09 thewolery systemd[1]: Starting OpenSSH server daemon...
Feb 07 07:55:09 thewolery systemd[1]: Starting User Login Management...
Feb 07 07:55:09 thewolery systemd[1]: Starting Permit User Sessions...
Feb 07 07:55:09 thewolery systemd[1]: Finished Permit User Sessions.
Feb 07 07:55:09 thewolery qm[1513]: ScarletDME (64 Bit) has been started
Feb 07 07:55:09 thewolery systemd-journald[1105]: Journal stopped
Feb 07 07:55:10 thewolery systemd-journald[1105]: Received SIGTERM from
PID 1521 (qm).

Well that's interesting.

Looks like the 'qm' service here randomly killed systemd-journald – this alone can cause several problems, as many services' stdout goes to journald, so killing it would cause many services to get -EPIPE as soon as they log a message.

Feb 07 07:55:09 thewolery systemd[1]: Started Dovecot IMAP/POP3 email
server.
Feb 07 07:55:09 thewolery systemd[1]: Starting Postfix Mail Transport
Agent...
Feb 07 07:55:09 thewolery systemd[1]: Started ScarletDME service start.
Feb 07 07:55:09 thewolery systemd[1]: Starting Samba SMB Daemon...
Feb 07 07:55:09 thewolery systemd[1]: Starting OpenSSH server daemon...
Feb 07 07:55:09 thewolery systemd[1]: Starting User Login Management...
Feb 07 07:55:09 thewolery systemd[1]: Starting Permit User Sessions...
Feb 07 07:55:09 thewolery systemd[1]: Finished Permit User Sessions.
Feb 07 07:55:09 thewolery qm[1513]: ScarletDME (64 Bit) has been started
Feb 07 07:55:09 thewolery systemd-journald[1105]: Journal stopped
Feb 07 07:55:10 thewolery systemd-journald[1105]: Received SIGTERM from
PID 1521 (qm).

It does that again.

Feb 07 07:55:10 thewolery systemd[1]: systemd-journald.service:
Scheduled restart job, restart counter is at 1.
Feb 07 07:55:10 thewolery systemd[1]: Stopped Journal Service.
Feb 07 07:55:10 thewolery systemd[1]: Starting Journal Service...
Feb 07 07:55:10 thewolery systemd-journald[1539]: Journal started
Feb 07 07:55:10 thewolery systemd-journald[1539]: System Journal
(/var/log/journal/053db4036a25efc6f8d2186a603c3f79) is 768.4M, m>Feb 07
07:55:09 thewolery systemd[1]: dbus.service: Deactivated successfully.
Feb 07 07:55:09 thewolery lvm[1285]: dmeventd detected break while being
idle for 1644220509 second(s), exiting.
Feb 07 07:55:09 thewolery systemd[1]: smb.service: Failed with result
'protocol'.

I'm not sure what happened here, but it *could* be that it died due to the journald stdout pipe being broken (which happens when the receiving end, i.e. journald, dies).

Feb 07 07:55:09 thewolery lvm[1285]: dmeventd shutting down.
Feb 07 07:55:09 thewolery systemd[1]: Failed to start Samba SMB Daemon.
Feb 07 07:55:09 thewolery dmeventd[1285]: You have a memory leak (not
released memory pool):
Feb 07 07:55:09 thewolery systemd[1]: systemd-udevd.service: Deactivated
successfully.
Feb 07 07:55:09 thewolery dmeventd[1285]:  [0x7f7018000c40] library
Feb 07 07:55:10 thewolery systemd[1]: Started Journal Service.
Feb 07 07:55:09 thewolery systemd[1]: systemd-udevd.service: Consumed
1.986s CPU time.
Feb 07 07:55:09 thewolery dmeventd[1285]:  [0x55a9444f6ed0] command
Feb 07 07:55:09 thewolery systemd[1]: sshd.service: Control process
exited, code=killed, status=15/TERM

Given previous messages, I'm strongly suspecting that qm killed this one, too.


Feb 07 07:55:09 thewolery dmeventd[1285]:  [0x7f7018004470] pending_delete
Feb 07 07:55:09 thewolery systemd[1]: sshd.service: Failed with result
'signal'.
Feb 07 07:55:09 thewolery dmeventd[1285]:  [0x7f70180044c0] config
Feb 07 07:55:09 thewolery systemd[1]: Failed to start OpenSSH server daemon.
Feb 07 07:55:09 thewolery dmeventd[1285]:  [0x7f7018004600] config
Feb 07 07:55:09 thewolery systemd[1]: systemd-logind.service:
Deactivated successfully.
Feb 07 07:55:09 thewolery dmeventd[1285]:  [0x7f7018004650] config
Feb 07 07:55:09 thewolery systemd[1]: Started User Login Management.
Feb 07 07:55:09 thewolery dmeventd[1285]:  [0x7f7018024c00] dev_cache
Feb 07 07:55:09 thewolery systemd[1]: scarletdme.service: Deactivated
successfully.
Feb 07 07:55:09 thewolery dmeventd[1285]:  [0x7f7018028ee0] text_orphan
Feb 07 07:55:09 thewolery systemd[1]: systemd-udevd.service: Scheduled
restart job, restart counter is at 1.
Feb 07 07:55:09 thewolery dmeventd[1285]:  [0x7f701802d0e0] format_instance
Feb 07 07:55:09 thewolery systemd[1]: systemd-logind.service: Scheduled
restart job, restart counter is at 1.
Feb 07 07:55:09 thewolery dmeventd[1285]:  [0x7f701802fd10] sysfs
Feb 07 07:55:09 thewolery systemd[1]: Started D-Bus System Message Bus.
Feb 07 07:55:09 thewolery dmeventd[1285]:  [0x7f7018030400] filter regex
Feb 07 07:55:09 thewolery systemd[1]: Stopped User Login Management.
Feb 07 07:55:09 thewolery dmeventd[1285]:  [0x7f7018034460] mirror_dso
Feb 07 07:55:09 thewolery systemd[1]: Starting Load Kernel Module drm...
Feb 07 07:55:09 thewolery dmeventd[1285]:  [0x7f701803a680] snapshot_state
Feb 07 07:55:09 thewolery systemd[1]: Stopped Rule-based Manager for
Device Events and Files.
Feb 07 07:55:09 thewolery dmeventd[1285]:  [0x7f7010000c40] snapshot_state
Feb 07 07:55:09 thewolery systemd[1]: systemd-udevd.service: Consumed
1.986s CPU time.
Feb 07 07:55:09 thewolery dmeventd[1285]:  [0x7f7014000c40] snapshot_state
Feb 07 07:55:09 thewolery systemd[1]: Starting Rule-based Manager for
Device Events and Files...
Feb 07 07:55:09 thewolery dmeventd[1285]:  [0x7f7008000c40] snapshot_state
Feb 07 07:55:09 thewolery systemd[1]: modprobe@drm.service: Deactivated
successfully.
Feb 07 07:55:09 thewolery dmeventd[1285]: Internal error: Unreleased
memory pool(s) found.
Feb 07 07:55:09 thewolery systemd[1]: Finished Load Kernel Module drm.
Feb 07 07:55:09 thewolery systemd[1]: Starting User Login Management...
Feb 07 07:55:09 thewolery systemd[1]: Started Rule-based Manager for
Device Events and Files.
Feb 07 07:55:10 thewolery systemd-logind[1528]: New seat seat0.

Basically you have *a lot* of service daemons getting killed – some of them restart cleanly, others not – and it looks a bit like scarletdme is the one doing it. It even looks like it ended up killing itself, too.

Some of the errors seem unrelated, e.g. dmeventd's complaints are probably just fallout of unclean restarts.


Nothing obvious before that - a couple of lines in yellow that I ought
to investigate, but dunno why they'd be a problem ...

Trouble is, I'm not used to debugging dodgy systems, and I've only just
started using systemd seeing as I run gentoo, and that's not its default...

Cheers,
Wol

[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux