Re: PG vs ElasticSearch for Logs

Andy Colson <andy@xxxxxxxxxxxxxxx> · Mon, 22 Aug 2016 09:03:45 -0500

On 8/22/2016 2:39 AM, Thomas Güttler wrote:

Am 19.08.2016 um 19:59 schrieb Andy Colson:
On 8/19/2016 2:32 AM, Thomas Güttler wrote:
I want to store logs in a simple table.

Here my columns:

  Primary-key (auto generated)
  timestamp
  host
  service-on-host
  loglevel
  msg
  json (optional)

I am unsure which DB to choose: Postgres, ElasticSearch or ...?

We don't have high traffic. About 200k rows per day.

My heart beats for postgres. We use it since several years.

On the other hand, the sentence "Don't store logs in a DB" is
somewhere in my head.....

What do you think?

I played with ElasticSearch a little, mostly because I wanted to use
Kibana which looks really pretty.  I dumped a ton
of logs into it, and made a pretty dashboard ... but in the end it
didn't really help me, and wasn't that useful.  My
problem is, I don't want to have to go look at it.  If something goes
bad, then I want an email alert, at which point
I'm going to go run top, and tail the logs.

Another problem I had with kibana/ES is the syntax to search stuff is
different than I'm used to.  It made it hard to
find stuff in kibana.

Right now, I have a perl script that reads apache logs and fires off
updates into PG to keep stats.  But its an hourly
summary, which the website turns around and queries the stats to show
pretty usage graphs.

You use Perl to read apache logs. Does this work?

Forwarding logs reliably is not easy. Logs are streams, files in unix
are not streams. Sooner or later
the files get rotated. RELP exists, but AFAIK it's usage is not wide
spread:

  https://en.wikipedia.org/wiki/Reliable_Event_Logging_Protocol

Let's see how to get the logs into postgres ....

In the end, PG or ES, all depends on what you want.

Most of my logs start from a http request. I want a unique id per request
in every log line which gets created. This way I can trace the request,
even if its impact spans to several hosts and systems which do not
receive http requests.

Regards,
  Thomas Güttler

I don't read the file.  In apache.conf:

# v, countyia, ip, sess, ts, url, query, status
LogFormat 
"3,%{countyName}e,%a,%{VCSID}C,%{%Y-%m-%dT%H:%M:%S%z}t,\"%U\",\"%q\",%>s" 
csv3

CustomLog "|/usr/local/bin/statSender.pl -r 127.0.0.1" csv3

I think I read somewhere that if you pipe to a script (like above) and 
you dont read fast enough, it could slow apache down.  That's why the 
script above dumps do redis first.  That way I can move processes 
around, restart the database, etc, etc, and not break apache in any way.

The important part of the script:

while (my $x = <>)
{
	chomp($x);
	next unless ($x);
try_again:
	if ($redis)
	{
		eval {
			$redis->lpush($qname, $x);
		};
		if ($@)
		{
			$redis = redis_connect();
			goto try_again;
		}
		# just silence this one
		eval {
			$redis->ltrim($qname, 0, 1000);
		};
	}
}

Any other machine, or even multiple, then reads from redis and inserts 
into PG.

You can see, in my script, I trim the queue to 1000 items, but that's 
because I'm not as worried about loosing results.  Your setup would 
probably be different.  I also setup redis to not save anything to disk, 
again, because I don't mind if I loose a few hits here or there.  But 
you get the idea.

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general