Re: Syslog to postgresql - need peaks of 5.000/s

paul rivers <rivers.paul@xxxxxxxxx> · Sat, 26 Jan 2008 14:49:20 -0800

Michael Monnerie wrote:
Dear list,

I shall build up a central syslog server able to do 5.000 log lines per
second peak, 1.000 sustained. I found some pages describing current
syslog-ng with magic "log to file then insert":
http://ben.muppethouse.com/SYSLOG-DOC.html
http://www.kdough.net/docs/syslog_postgresql/

But according to syslog, they will release the "direct syslog to postgresql"
feature soon:
http://www.balabit.com/network-security/syslog-ng/

Anyway, I'd like to know if anybody has a syslog-ng with postgresql that
does up to 5000 inserts/s, and what hardware I need for that. Of course,
some selects for statistical purposes must also be possible, so not only
writes here. Any ideas?

mfg zmi

Hi Michael,

I do something like this with syslog-ng logs. The processing looks 
something like this:

- Today, about 20 different devices feed the same kind of logs to a 
central syslog server.
- For each device,a parse-and-load (p&l) daemon is running that parses 
data elements out of the log and then stores the parsed data plus a copy 
of the original log line to the db. The parsing is probably less trivial 
than one would guess (much more complex and varied than say web server 
logs, for example).
- This p&l daemon is safely following the tail of the file, and so this 
parsing and loading is happening in near real time, where each log line 
turns into an insert.
- For a given <device>/<year>/<month>/<day> file, the p&l of course 
follows to <day+1> when the next day's file appears.
- Everything is of course transactional with a status table in the db 
tracking where it is, so if a p&l is shutdown for whatever reason, it 
knows what line in the log to pick up and start at again. There are 
config options to commit every so many log lines or every x seconds and 
that sort of thing.

Using the same libraries I wrote for the p&l daemon, there is a bulk 
load utility to take a raw syslog file, parse and load it. The main 
difference between this bulk utility and the p&l is the bulk utility 
completely parses the syslog file and generates an intermediate file it 
then does a COPY to load in one fell swoop.

There is some selects done on this throughout the day, but that's hard 
to quantify -- it varies a lot depending on need. On a nightly basis, 
though, we end up touching the entire day's input to generate summary 
data that allows us to do rapid trend analysis. In addition to some 
ad-hoc access, a slew of reports run periodically against both the raw 
and rollup data. There has been talk of revising the rollup process, so 
it can happen more or less continuously throughout the day. No work has 
been done, but I believe it's entirely feasible.

We also use table partitioning. There is a partition for each day + 
device. This dramatically help performance, so long as you're very up on 
the documented caveats of Pg's partitioning.

I did extensive metrics on what our throughput was for both the p&l 
process and the bulk-loader. In our case, the p&l seemed to peak around 
1,600 lines/sec and the bulk loader around 5,500 lines/sec for the whole 
process: read, parse, load and commit the data to the db. For each 
additional p&l daemon running on the same host, we lose a small 
percentage of peak (same for the bulk processor).

The machine is absolutely nothing special: a 32 bit dell blade, dual 
core 2 cpu machine with 4gb running RHEL4 with postgres 8.2.6. The disk 
is SAN-attached to a hitachi 9990, but that's because that's what where 
I work insists on for any box. This box is running everything -- 
receiving the logs, parsing and loading them, and running the db. [There 
is, incidentally, a redundant machine receiving the same logs and doing 
the same thing.] I'm sorry I don't know offhand how many spindles in the 
SAN back the machine's filesystems, but we're not talking dozens. This 
machine is really considered nothing special. This isn't even the only 
thing this machine is doing.

Also, the parsing of our files adds a significant amount of time. If you 
are doing minimal parsing and more or less bulk-loading the data, I 
would expect you to see far better numbers than the 1,600 l/s and 5,500 
l/s.

Our system has been running for about a year and a half, and postgres 
has always been rock solid. I just ran a query to see we processed about 
1.5 trillion log lines last month. That works out to an average of 
around 560 lines/s or so, which is less than what you're targeting. But 
if rates doubled (either if we doubled the number of tracked devices or 
the existing doubled their volumes), I'm confident there would be no 
problems.

Were I in your shoes, I would start collecting the syslogs immediately, 
and then prototype something up, so you can profile your process end to 
end, think about what you do if you ever need to catch up the loading 
[we never have needed to, but we know we can, which is great], and so 
forth.

I think a lot will be determined by how much you need to process the 
logs before loading, whether your loads are going to be via insert or 
copy, and how many logfiles will you be feeding the db at the same time. 
Hardware and all that will come into play, but until you answer these 
questions first, it's going to be hard to say more about it.

Anyway, that's my experience with this, hope it helps.

Regards,
Paul

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq