On 6/11/07, Vladimir Stankovic <V.Stankovic@xxxxxxxxxx> wrote:
Hi all, It seems that I have an issue with the performance of a PostgreSQL server. I'm running write-intensive, TPC-C like tests. The workload consist of 150 to 200 thousand transactions. The performance varies dramatically, between 5 and more than 9 hours (I don't have the exact figure for the longest experiment). Initially the server is relatively fast. It finishes the first batch of 50k transactions in an hour. This is probably due to the fact that the database is RAM-resident during this interval. As soon as the database grows bigger than the RAM the performance, not surprisingly, degrades, because of the slow disks. My problem is that the performance is rather variable, and to me non-deterministic. A 150k test can finish in approx. 3h30mins but conversely it can take more than 5h to complete. Preferably I would like to see *steady-state* performance (where my interpretation of the steady-state is that the average throughput/response time does not change over time). Is the steady-state achievable despite the MVCC and the inherent non-determinism between experiments? What could be the reasons for the variable performance? - misconfiguration of the PG parameters (e.g. autovacuum does not cope with the dead tuples on the MVCC architecture) - file fragmentation - index bloat - ???
vmstat is telling you that the server is i/o bound. an iostat will tell be helpful to tell you where things are binding up...either the data volume, wal volume, or both. I suspect your sorts are spilling to disk which is likely the cause of the variable performance, interacting with autovacuum. Another possibility is vacuum is bogging you down. look for pg_tmp folders inside the database tree to see if this is happening. Also you want to see if your server is swapping. first, I'd suggest bumping maintenance_work_mem to 256mb. I'd also suggest bumping work_mem higher, but you are going to have to calculate how far to go based on how many active queries with sort are going to fire simultaneously. It can be a fine line because your a bit underpowered memory but your database is small as well. bumping work_mem but throwing your server into swap solves nothing. merlin