Search Postgresql Archives

Re: Log file monitoring and event notification

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



General thought:

It's entirely possible my current Postgres environment is missing 
something (I'm an automation engineer, not a DBA - most of my postgres 
knowledge has been learned on the job or from Google), but we actively 
monitor the receive and replay lag (i.e. comparing 
pg_current_xlog_location() on the master to 
pg_last_xlog_receive_location() and pg_last_xlog_replay_location() on 
the slaves) and alert off of that. We don't use any logs for replication 
alerts.

We *do*, however, monitor postgres logs for other things. We use Nagios 
(actually Icinga) as our monitoring system, and there's a nice Perl 
plugin available online called check_logfiles 
(http://exchange.nagios.org/directory/Plugins/Log-Files/check_logfiles/details) 
that handles alerting on regular expressions in a log file, and also 
very nicely handles file rotation (even compression), and is highly 
configurable (including perl hook scripts to run if a match is found).

In the easiest case (like if you're not using a real monitoring system), 
you could just configure this script, run it however you want (cron?) 
and if it exits non-zero, mail the output.

In terms of embedding things in Postgres, I'm a staunch believer that 
for performance and reliability, something like alerting shouldn't be 
embedded in the application itself but should be handled by an external 
(and easily replace-able) component. It's easy enough to do with 
logging_collector, or to do with syslog (AFAIK the worry about not 
capturing everything is only if you're shipping syslog over the network, 
not if you're running a syslogd on the same host as postgres and writing 
the logs locally).

 From a systems management/monitoring standpoint, I'd much rather see 
something in postgres that sends detailed, well-structured log messages 
to a message queue than put the alerting logic in it (syslog works with 
everything, but it's so horribly obsolete).

-Jason

On 04/05/14 11:47, Andy Colson wrote:
> Hi All.
>
> I've started using replication, and I'd like to monitor my logs for 
> any errors or problems.  I don't want to do it manually, and I'm not 
> interested in stats (a la PgBadger).
>
> What I'd like, is the instant PG logs: "FATAL: wal segment already 
> removed" (or some such bad thing), I'd like to get an email.
>
> 1st: is anyone using a program that does something like this? What do 
> you use?  How do you like it?
>
> My thinking has been along these lines:
>
>  + log to syslog doesnt really help, and I recall seeing somewhere 
> "syslog may not capture everything".  I still have monitoring and log 
> rotation problems.
>
>  + log to stderr and write my own collector works, but then I have to 
> duplicate what logging_collector already does (rotating, truncating, 
> age, size, etc).  Too much work.
>
>  + log with logging_collector, then write a thing to figure out what 
> file its writing to and tail it, watch for rotation, etc. This is just 
> messy.
>
> If there isn't a program already available (which I've searched for, 
> believe me), I'd like to get feedback on extending logging_collector 
> with some lua scriptable event notification.
>
> Lua is small, fast, and mostly easy to embed.  It would allow an admin 
> to customize whatever kind of monitoring they want.  When an event 
> matches logging_collector would spawn off a different app to handle 
> the event notification.  The app would be launched in the background 
> and forgotten about so that logging isn't delayed.
>
> I'm thinking:
>
> function checkLine(item)
>   if item:find('FATAL') then
>      launch('/usr/bin/mynotify.pl', item)
>   end
> end
>
> Logging_collector would then do something like (forgive the perl 
> pseudo code):
>
> ... regular log file rotation stuff ..
> open OUT
> while ($line = <stderr>)
> {
>   checkLine($line);
>   print OUT $line;
> }
>
> ... etc, etc ...
>
> Lua could also have another handy events defined:
>     OnLogRotate(newFile)
>     OnStartup()
>     OnShutdown()
>
>
> Lua can also keep state, so maybe you dont want to email on the first 
> FATAL, but on the third.
>
> local cc = 0
> function checkLine(item)
>   if item:find('FATAL') then
>      cc = cc + 1
>      if cc > 2 then
>        launch('/usr/bin/mynotify.pl', item)
>        cc = 0
>      end
>   end
> end
>
> Thoughts?
>
> -Andy
>
>


-- 

Jason Antman | Systems Engineer | CMGdigital
jason.antman@xxxxxxxxxx | p: 678-645-4155


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux