Re: Health check for a service managed by systemd

Mantas Mikulėnas <grawity@xxxxxxxxx> · Fri, 26 Jul 2019 16:45:01 +0300

On Fri, Jul 26, 2019 at 4:37 PM Debraj Manna <subharaj.manna@xxxxxxxxx> wrote:
Can we make use of the watchdog & systemd-notify functionality of systemd? I mean something like this. 

[Unit]
Description=Test service
After=network.target

[Service]
Type=notify
# test.sh wrapper script to call the service
ExecStart=/opt/test/test.sh
Restart=always
RestartSec=1
TimeoutSec=5
WatchdogSec=5

[Install]
WantedBy=multi-user.target

Then in test.sh can we do something like 

#!/bin/bash
trap 'kill $(jobs -p)' EXIT

# Start the actual service
/opt/test/service &
PID=$!

/bin/systemd-notify --ready
while(true); do
    FAIL=0
    kill -0 $PID
    if [[ $? -ne 0 ]]; then FAIL=1; fi

#    curl http://localhost/test/
#    if [[ $? -ne 0 ]]; then FAIL=1; fi

if [[ $FAIL -eq 0 ]]; then /bin/systemd-notify WATCHDOG=1; fi

    sleep 1
done

That doesn't look nice; it might technically work but it isn't any better than a standalone periodic check script. On top of that, the script calls --ready without knowing whether the service is ready; /bin/systemd-notify as an external binary doesn't work very well; and the way you implement PID existence check means even a completely crashed/exited daemon won't get restarted until watchdog timeout expires...

Consider something already made for this purpose, such as Monit.

-- 
Mantas Mikulėnas
_______________________________________________
systemd-devel mailing list
systemd-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/systemd-devel