Hello,
I'm running a fairly complex query on my postgres-8.4.9 Ubuntu box. The box has 8-core CPU, 18G of RAM and no virtualization layer. The query takes many hours to run. The query essentially involves a join of two large tables on a common string column, but it also includes joins with other smaller tables. I looked at the query plan, as produced by EXPLAIN and found it reasonable.
When the query starts, Linux `top' shows fairly low RAM usage (in the hundreds of MB range) and about 100% CPU usage for the relevant postgres process. I'm also seeing some temp files being generated by this postgres process in the postgres temp directory. All this makes sense to me.
As the query processes further, the memory usage by this postgres process shoots up to 12G resident and 17G virtual, while the CPU usage falls down to single-digit percents. The need to utilize more memory at some point during query execution seems in agreement with the query plan.
I feel that my server configuration is not optimal: I would like to observe close to 100% CPU utilization on my queries, but seeing 20 times lower values.
My query forks a single large-RAM process running on the server. There are other queries running on the same server, but they are quick and light on memory.
I cannot explain the following observations:
* Postgres is not writing temp files into its temp directory once the RAM usage goes up, but vmstat shows heavy disk usage, mostly the "swap in" field is high. Top shows 6G of swap space in use.
* All my attempts to limit postgres' memory usage by playing with postgres config parameters failed.
Here are the relevant parameters from postgresql.conf (I did use SHOW parameter to check that the parameters have been read by the server). I think I'm using the defaults for all other memory-related configurations.
shared_buffers = 2GB (tried 8GB, didn't change anything)
work_mem = 128MB (tried 257MB, didn't change anything)
wal_buffers = 16MB
effective_cache_size = 12GB (tried 2GB didn't change anything)
In order to resolve my issue, I tried to search for postgres profiling tools and found no relevant ones. This is rather disappointing. That's what I expected to find:
* A tool that could explain to me why postgres is swapping.
* A tool that showed what kind of memory (work mem vs buffers, etc) was taking all that virtual memory space.
* A tool for examining plans of the running queries. It would be helpful to see what stage of the query plan the server is stuck on (e.g. mark the query plans with some symbols that indicate "currently running", "completed", "results in memory/disk", etc).
I realize that postgres is a free software and one cannot demand new features from people who invest their own free time in developing and maintaining it. I am hoping that my feedback could be useful for future development.
Thanks!