Several optimization options (config/hardware)

Martin Grotzke <martin.grotzke@xxxxxxxxxxxxxx> · Wed, 02 May 2012 15:19:00 +0200

Hi,

we want to see if we can gain better performance with our postgresql
database. In the last year the amount of data growed from ~25G to now
~140G and we're currently developing a new feature that needs to get
data faster from the database. The system is both read and write heavy.

At first I want to give you an overview over the hardware, software and
configuration and the changes that I see we could check out. I'd be very
happy if you could review and tell if the one or the other is nonsense.

Hardware:
- CPU: 4x4 Cores Intel Xeon L5630  @ 2.13GHz
- RAM: 64GB
- RAID 1 (1+0) HP Company Smart Array G6 controllers, P410i
  (I don't know the actual number of discs)
- A single partition for data and wal-files

Software
- RHEL 6, Kernel 2.6.32-220.4.1.el6.x86_64
- postgresql90-server-9.0.6-1PGDG.rhel6.x86_64

Configuration (selected from settings)
------------------------------+-----------+--------+-------------------
             name             |  setting  |  unit  |       source
------------------------------+-----------+--------+-------------------
 autovacuum                   | on        | [NULL] | configuration file
 checkpoint_completion_target | 0.5       | [NULL] | default
 checkpoint_segments          | 16        |        | configuration file
 checkpoint_timeout           | 300       | s      | default
 commit_delay                 | 0         |        | default
 default_statistics_target    | 100       |        | default
 effective_cache_size         | 16384     | 8kB    | default
 fsync                        | on        | [NULL] | default
 log_min_duration_statement   | 250       | ms     | configuration file
 log_temp_files               | -1        | kB     | default
 maintenance_work_mem         | 16384     | kB     | default
 max_connections              | 2000      |        | configuration file
 random_page_cost             | 4         | [NULL] | default
 shared_buffers               | 1310720   | 8kB    | configuration file
 synchronous_commit           | on        | [NULL] | default
 wal_buffers                  | 256       | 8kB    | configuration file
 wal_sync_method              | fdatasync | [NULL] | default
 wal_writer_delay             | 200       | ms     | default
 work_mem                     | 1024      | kB     | default
------------------------------+-----------+--------+-------------------

Some stats:
$ free -m
             total  used    free   shared  buffers  cached
Mem:         64413  63764    649        0       37   60577
-/+ buffers/cache:   3148   61264
Swap:         8191    333    7858

iostat shows nearly all the time ~100% io utilization of the disc
serving the pg data / wal files.

I'd suggest the following changes:

(Improve query planning)
1) Increase effective_cache_size to 48GB
2) Increase work_mem to 10MB (alternatively first activate
log_temp_files to see if this is really needed
3) Reduce random_page_cost to 1

(WAL / I/O)
4) Set synchronous_commit=off
5) Increase checkpoint_segments to 32
6) Increase wal_buffers to 16M
7) Add new discs (RAID) for wal files / pg_xlog

(Misc)
8) Increase maintainance_work_mem to 1GB

In parallel I'd review statistics like long running queries, index usage
(which ones can be dropped) etc.

At first I'd like to try out 1) to 3) as they affect the query planner,
so that some indices that are not used right now might be used then.

After this change I'd review index usage and clean up those / improve
queries.

Then, finally I'd test WAL / I/O related changes.

Do you think this makes sense? Do you see other improvements, or do you
need some more information?

Thanx in advance,
cheers,
Martin

Attachment:
signature.asc

Description: OpenPGP digital signature