Hi, On 2 Květen 2012, 15:19, Martin Grotzke wrote: > Hi, > > we want to see if we can gain better performance with our postgresql > database. In the last year the amount of data growed from ~25G to now > ~140G and we're currently developing a new feature that needs to get > data faster from the database. The system is both read and write heavy. What does the read/write heavy mean? How much data / transactions you need to handle, how many clients, etc.? > At first I want to give you an overview over the hardware, software and > configuration and the changes that I see we could check out. I'd be very > happy if you could review and tell if the one or the other is nonsense. > > Hardware: > - CPU: 4x4 Cores Intel Xeon L5630 @ 2.13GHz > - RAM: 64GB > - RAID 1 (1+0) HP Company Smart Array G6 controllers, P410i > (I don't know the actual number of discs) > - A single partition for data and wal-files Have you done any benchmarks with that hardware, to verify the performance? Can you do that now (i.e. stopping the database so that you can run them)? > > Software > - RHEL 6, Kernel 2.6.32-220.4.1.el6.x86_64 > - postgresql90-server-9.0.6-1PGDG.rhel6.x86_64 > > Configuration (selected from settings) > ------------------------------+-----------+--------+------------------- > name | setting | unit | source > ------------------------------+-----------+--------+------------------- > autovacuum | on | [NULL] | configuration file > checkpoint_completion_target | 0.5 | [NULL] | default > checkpoint_segments | 16 | | configuration file > checkpoint_timeout | 300 | s | default > commit_delay | 0 | | default > default_statistics_target | 100 | | default > effective_cache_size | 16384 | 8kB | default > fsync | on | [NULL] | default > log_min_duration_statement | 250 | ms | configuration file > log_temp_files | -1 | kB | default > maintenance_work_mem | 16384 | kB | default > max_connections | 2000 | | configuration file > random_page_cost | 4 | [NULL] | default > shared_buffers | 1310720 | 8kB | configuration file > synchronous_commit | on | [NULL] | default > wal_buffers | 256 | 8kB | configuration file > wal_sync_method | fdatasync | [NULL] | default > wal_writer_delay | 200 | ms | default > work_mem | 1024 | kB | default > ------------------------------+-----------+--------+------------------- > > Some stats: > $ free -m > total used free shared buffers cached > Mem: 64413 63764 649 0 37 60577 > -/+ buffers/cache: 3148 61264 > Swap: 8191 333 7858 > > iostat shows nearly all the time ~100% io utilization of the disc > serving the pg data / wal files. That's rather useless value, especially if you don't know details about the RAID array. With multiple spindles, the array may be 100% utilized (ratio of time it spent servicing requests) yet it may absorb more. Imagine a RAID with 2 drives, each 50% utilized. The array may report 100% utilization yet it's actually 50% utilized ... > > I'd suggest the following changes: > > (Improve query planning) > 1) Increase effective_cache_size to 48GB > 2) Increase work_mem to 10MB (alternatively first activate > log_temp_files to see if this is really needed > 3) Reduce random_page_cost to 1 > > (WAL / I/O) > 4) Set synchronous_commit=off > 5) Increase checkpoint_segments to 32 > 6) Increase wal_buffers to 16M > 7) Add new discs (RAID) for wal files / pg_xlog > > (Misc) > 8) Increase maintainance_work_mem to 1GB > > In parallel I'd review statistics like long running queries, index usage > (which ones can be dropped) etc. Reviewing long-running stats queries is a good starting point - you need to find out where the bottleneck is (I/O, CPU, ...) and this may be helpful. Dropping unused indexes is quite difficult - most of the time I see the case with multiple similar indexes, all of them are used but it's possible to remove some of them with minimal performance impact. > At first I'd like to try out 1) to 3) as they affect the query planner, > so that some indices that are not used right now might be used then. If you don't know where the issue is, it's difficult to give any advices. But in general, I'd say this 1) setting effective_cache_size to 48G - seems like a good idea, better match for your environment 2) increasing work_mem - might help, but you should check the slow queries first (enabling log_temp_files is a good idea) 3) setting random_page_cost is a really bad idea IMHO, especially with spinners, rather weak controller and unknown details about the array So do (1), maybe (2) and I'd definitely vote against (3). Regarding the other options: 4) synchronous_commit=off - well, this may improve the performance, but it won't fix the underlying issues and it may introduce other application-level issues (expecting the transaction to be committed etc.) 5) Increase checkpoint_segments to 32 - Do you see a lot of checkpoint-related warnings in the log? If not, this probably won't fix anything. If you actually do have issues with checkpoints, I'd recommend increasing the default checkpoint timeout (eg. to 30 minutes), significantly increasing the number of segments (e.g. to 64 or more) and tuning the completion target (e.g. to 0.9). 6) Increase wal_buffers to 16M - may help, but I would not expect a tremendous improvement. 7) Add new discs (RAID) for wal files / pg_xlog - good idea, moving those to a separate spindles may help a lot. > After this change I'd review index usage and clean up those / improve > queries. > > Then, finally I'd test WAL / I/O related changes. Why do you want to do this last? Chances are that writes are causing many of the I/O issues (because it needs to actually fsync the data). Tuning this will improve the general I/O performance etc. > Do you think this makes sense? Do you see other improvements, or do you > need some more information? First of all, find out more about the RAID array. Do some basic I/O tests (with dd etc.). Moreover, I've noticed you do have max_connections=2000. That's insanely high in most cases, unless you're using commit_delay/commit_siblings. A reasonable value is usually something like "num of cpus + num of drives" although that's just a rough estimate. But given that you have 16 cores, I'd expect ~100 or something like that. If you need more, I'd recommend a pooler (e.g. pgpool). Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance