Search Postgresql Archives

Re: High inserting by syslog

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Valter Douglas Lisbôa Jr. wrote:
Hello all, I have a perl script thats load a entire day squid log to a postgres table. I run it at midnight by cronjob and turns off the indexes before do it (turning it on after). The script works fine, but I want to change this to a diferent approach.

I'd like to insert on the fly the log lines, so long it be generated to have the data on-line. But the table has some indexes and the load of lines is about 300.000/day, so the average inserting is 3,48/sec. I think this could overload the database server (i did not test yet), so if I want to create a no indexed table to receive the on-line inserting and do a job moving all lines to the main indexed table at midnight.

There are two things to bear in mind.

1. What you need to worry about is the peak rate of inserts, not the average. Even at 30 rows/sec that's not too bad. 2. What will your system do if the database is taken offline for a period? How will it catch up?

The limiting factor will be the speed of your disks. Assuming a single disk (no battery-backed raid cache) you'll be limited to your RPM (e.g. 10,000 commits / minute). That will fall off rapidly if you only have one disk and it's busy doing other reads/writes. But, if you batch many log-lines together you need many less commits.

So - to address both points above, I'd use a script with a flexible batch-size.
1. Estimate how many log-lines need to be saved to the database.
2. Batch together a suitable number of lines (1-1000) and commit them to the database.
3. Sleep 1-10 secs
4. Back to #1, disconnect and reconnect every once in a while.

If the database is unavailable for any reason, this script will automatically feed rows faster when it returns.

My question is, Does exists a better solution, or this tatic is a good way to do this?

You might want to partition the table monthly. That will make it easier to manage a few years from now.
http://www.postgresql.org/docs/current/static/ddl-partitioning.html

Also, consider increasing checkpoint_segments if you find the system gets backed-up. Perhaps consider setting synchronous_commit to off (but only for the connection saving the log-lines to the database)
http://www.postgresql.org/docs/8.3/static/runtime-config-wal.html

--
  Richard Huxton
  Archonet Ltd


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux