General thought: It's entirely possible my current Postgres environment is missing something (I'm an automation engineer, not a DBA - most of my postgres knowledge has been learned on the job or from Google), but we actively monitor the receive and replay lag (i.e. comparing pg_current_xlog_location() on the master to pg_last_xlog_receive_location() and pg_last_xlog_replay_location() on the slaves) and alert off of that. We don't use any logs for replication alerts. We *do*, however, monitor postgres logs for other things. We use Nagios (actually Icinga) as our monitoring system, and there's a nice Perl plugin available online called check_logfiles (http://exchange.nagios.org/directory/Plugins/Log-Files/check_logfiles/details) that handles alerting on regular expressions in a log file, and also very nicely handles file rotation (even compression), and is highly configurable (including perl hook scripts to run if a match is found). In the easiest case (like if you're not using a real monitoring system), you could just configure this script, run it however you want (cron?) and if it exits non-zero, mail the output. In terms of embedding things in Postgres, I'm a staunch believer that for performance and reliability, something like alerting shouldn't be embedded in the application itself but should be handled by an external (and easily replace-able) component. It's easy enough to do with logging_collector, or to do with syslog (AFAIK the worry about not capturing everything is only if you're shipping syslog over the network, not if you're running a syslogd on the same host as postgres and writing the logs locally). From a systems management/monitoring standpoint, I'd much rather see something in postgres that sends detailed, well-structured log messages to a message queue than put the alerting logic in it (syslog works with everything, but it's so horribly obsolete). -Jason On 04/05/14 11:47, Andy Colson wrote: > Hi All. > > I've started using replication, and I'd like to monitor my logs for > any errors or problems. I don't want to do it manually, and I'm not > interested in stats (a la PgBadger). > > What I'd like, is the instant PG logs: "FATAL: wal segment already > removed" (or some such bad thing), I'd like to get an email. > > 1st: is anyone using a program that does something like this? What do > you use? How do you like it? > > My thinking has been along these lines: > > + log to syslog doesnt really help, and I recall seeing somewhere > "syslog may not capture everything". I still have monitoring and log > rotation problems. > > + log to stderr and write my own collector works, but then I have to > duplicate what logging_collector already does (rotating, truncating, > age, size, etc). Too much work. > > + log with logging_collector, then write a thing to figure out what > file its writing to and tail it, watch for rotation, etc. This is just > messy. > > If there isn't a program already available (which I've searched for, > believe me), I'd like to get feedback on extending logging_collector > with some lua scriptable event notification. > > Lua is small, fast, and mostly easy to embed. It would allow an admin > to customize whatever kind of monitoring they want. When an event > matches logging_collector would spawn off a different app to handle > the event notification. The app would be launched in the background > and forgotten about so that logging isn't delayed. > > I'm thinking: > > function checkLine(item) > if item:find('FATAL') then > launch('/usr/bin/mynotify.pl', item) > end > end > > Logging_collector would then do something like (forgive the perl > pseudo code): > > ... regular log file rotation stuff .. > open OUT > while ($line = <stderr>) > { > checkLine($line); > print OUT $line; > } > > ... etc, etc ... > > Lua could also have another handy events defined: > OnLogRotate(newFile) > OnStartup() > OnShutdown() > > > Lua can also keep state, so maybe you dont want to email on the first > FATAL, but on the third. > > local cc = 0 > function checkLine(item) > if item:find('FATAL') then > cc = cc + 1 > if cc > 2 then > launch('/usr/bin/mynotify.pl', item) > cc = 0 > end > end > end > > Thoughts? > > -Andy > > -- Jason Antman | Systems Engineer | CMGdigital jason.antman@xxxxxxxxxx | p: 678-645-4155 -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general