Re: Health check for a service managed by systemd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Am 26.07.19 um 15:37 schrieb Debraj Manna:
> Thanks Reindl for replying. 
> 
> Can we make use of the watchdog & systemd-notify functionality of
> systemd? I mean something like this. 

probably you can but i doubt you gain anything

you just increase complexity with additional points of errors and i
don't see the point when you have to use curl in both cases where the
difference to just call "systemctl condrestart" is

write a nice logline with 'logger' and the same wording as a failed
service, i do that because a cronjob collects all that events systemwide
to trigger cron mails

don't forget 'condrestart' because it's not funny when you stop a
service by purpose and some monitoring fires it up unasked, been there
with mysqld 10 years ago.....

that below is part of my httpd rpm and works like a charme for years and
"$max_fail_count = 3" with the sleep is important in the real world
because when you are under load and temporary out of workers it's not
funny when some "crap" restarts the webserver and reset the PHP bytecode
cache all the time

---------------------------------------------------------------------

[root@testserver:~]$ systemctl status monitor-httpd
● monitor-httpd.service - Monitor/Restart Webserver
   Loaded: loaded (/usr/lib/systemd/system/monitor-httpd.service;
enabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-07-25 04:48:57 CEST; 1 day
11h ago
 Main PID: 821 (php)
    Tasks: 1 (limit: 512)
   Memory: 3.7M
   CGroup: /system.slice/monitor-httpd.service
           └─821 /usr/bin/php -n -d display_errors=1 -d
display_startup_errors=1 /usr/bin/monitor-httpd.php
https://rhsoft.testserver.rhsoft.net/robots.txt

---------------------------------------------------------------------

[root@testserver:~]$ cat /usr/lib/systemd/system/monitor-httpd.service
[Unit]
Description=Monitor/Restart Webserver
After=httpd.service network-online.target
Requires=network-online.target
ConditionPathExists=/etc/sysconfig/monitor-httpd
ConditionPathExists=/usr/bin/monitor-httpd.php
ConditionPathExists=/usr/bin/php

[Service]
Type=simple
EnvironmentFile=/etc/sysconfig/monitor-httpd
ExecStart=/usr/bin/php -n -d display_errors=1 -d
display_startup_errors=1 /usr/bin/monitor-httpd.php $MONITOR_URL

Restart=always
RestartSec=5
TimeoutSec=5

User=root
Group=root

CapabilityBoundingSet=CAP_KILL
MemoryDenyWriteExecute=yes
NoNewPrivileges=yes
PrivateDevices=yes
PrivateTmp=yes
ProtectControlGroups=yes
ProtectHome=yes
ProtectKernelModules=yes
ProtectKernelTunables=yes
ProtectSystem=strict

[Install]
WantedBy=multi-user.target

---------------------------------------------------------------------

[root@testserver:~]$ cat /etc/sysconfig/monitor-httpd
MONITOR_URL=https://rhsoft.testserver.rhsoft.net/robots.txt

---------------------------------------------------------------------

[root@testserver:~]$ cat /usr/bin/monitor-httpd.php
#!/usr/bin/php
<?php declare(strict_types=1);
/** make sure we are running as shell-script */
if(PHP_SAPI !== 'cli')
{
 exit("FORBIDDEN\n");
}

/** we need at test-url as param */
if(empty($_SERVER['argv'][1]))
{
 exit("USAGE: monitor-httpd.php <URL>\n");
}

/** do not verify certificates */
stream_context_set_default(['ssl'=>['verify_peer'=>FALSE,
'verify_peer_name'=>FALSE, 'allow_self_signed'=>TRUE]]);

/** lower default timeouts */
ini_set('default_socket_timeout', '5');

/** init vars */
$max_fail_count = 3;
$fail_count     = 0;
$last_restart   = 0;

/** service loop */
while(true)
{
 if(check_service() !== TRUE)
 {
  $fail_count++;
  sleep(3);
 }
 /** avoid false positives and too fast restarts */
 if($fail_count >= $max_fail_count && (time()-$last_restart) > 60)
 {
  echo __FILE__ . ": ERROR - httpd.service: Service hold-off time over,
scheduling restart\n";
  passthru('/usr/bin/systemctl condrestart httpd.service');
  $fail_count   = 0;
  $last_restart = time();
 }
 /** sleep 10 seconds between checks */
 sleep(10);
}

/**
 * check if service is available and responds
 *
 * @access public
 * @return bool
*/
function check_service(): bool
{
 $rw = @file_get_contents($_SERVER['argv'][1]);
 if($rw === FALSE)
 {
  return FALSE;
 }
 else
 {
  return TRUE;
 }
}

> [Unit]
> Description=Test service
> After=network.target
> 
> [Service]
> Type=notify
> # test.sh wrapper script to call the service
> ExecStart=/opt/test/test.sh
> Restart=always
> RestartSec=1
> TimeoutSec=5
> WatchdogSec=5
> 
> [Install]
> WantedBy=multi-user.target
> 
> Then in test.sh can we do something like 
> 
> #!/bin/bash
> trap 'kill $(jobs -p)' EXIT
> 
> # Start the actual service
> /opt/test/service &
> PID=$!
> 
> /bin/systemd-notify --ready
> while(true); do
>     FAIL=0
>     kill -0 $PID
>     if [[ $? -ne 0 ]]; then FAIL=1; fi
> 
> #    curl http://localhost/test/
> #    if [[ $? -ne 0 ]]; then FAIL=1; fi
> 
> if [[ $FAIL -eq 0 ]]; then /bin/systemd-notify WATCHDOG=1; fi
> 
>     sleep 1
> done
> 
> 
> On Fri, Jul 26, 2019 at 12:27 AM Reindl Harald <h.reindl@xxxxxxxxxxxxx
> <mailto:h.reindl@xxxxxxxxxxxxx>> wrote:
> 
> 
> 
>     Am 25.07.19 um 20:38 schrieb Debraj Manna:
>     > I have a service on a Ubuntu 16.04 which I use systemctl start, stop,
>     > restart and status to control.
>     >
>     > One time the systemctl status returned active, but the application
>     > "behind" the service responded http code different from 200.
>     >
>     > So I would like to restart the service when the http code is not 200.
>     > Can some one let me know is there a way to achieve the same via
>     systemd?
> 
>     nope, just write a seperate service with a little curl magic and
>     "systemctl condrestart" and remember that you have to avoid premature
>     restarts just because of a little load peak
_______________________________________________
systemd-devel mailing list
systemd-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/systemd-devel




[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux