[Sorry for the length of this post. It stretched as I provided as much info as possible..] So the rubber meets the road. We've put postgresql in a production environment with some heavy simultaneous usage. It works well in general, but often PG doesn't respond. How should I test what is going wrong? All tables are vacuumed, analyzed and such. INSERTS and UPDATEs are happening but not much, it's mostly SELECTs. Is PGSQL running out of connections? We can temporarily fix this by restarting pgsql but I'd like a more tenable solution. Speculating that it could be some conf variable somewhere (max_fsm_pages in particular) I am including three things at the bottom of this post: 1. Our PS output (for "postgres") 2. *Verbose* vacuum info for a table that shows max_fsm warning 3. Our postgresql.conf settings My question 1 -- how should we test and tweak our production installation? Where should we look. In MySQL we could do a "show status" at the console and it would give a mountain of information. Then there was that handy little "tuning-primer" script that made it all come alive. I suppose this stuff is also available in pg_catalog but is there any website that goes in depth into HOW to tune, what different values mean, and such? My question 2 -- in production, we're constantly seeing this message while vacuuming one table with less than 3 million rows, but one that we expect to keep growing: [------------- WARNING: relation "public.links" contains more than "max_fsm_pages" pages with useful free space HINT: Consider compacting this relation or increasing the configuration parameter "max_fsm_pages". VACUUM -------------] I can merrily increase the "max_fsm_pages" directive, but the manual also caveats that with "this can use more system V memory than available on your system". My full verbose vacuum info below includes the line: [------------- INFO: "traders": scanned 3000 of 199396 pages, containing 40775 live rows and 0 dead rows; 3000 rows in sample, 2710124 estimated total rows -------------] Does this mean my table needs nearly 200,000 pages, and that should be the setting of max_fsm_pages? This server is on a fairly common setup these days: Dual AMD Opterons, 4GB memory, SATA RAID 1, 250GB each. I don't mind letting postgres use up to 1GB of the memory for itself, but the rest is needed for others. >From http://www.postgresql.org/docs/8.2/static/runtime-config-resource.html , it seems the "max_fsm_relations" is about how many tables and indexes can be tracked in the free space map. Does this mean the number of actual tables and indexes in postgres databases, or instances of these tables? For example, if I only run 5 databases, each of which have about 10 tables and 20 indexes, then I have only 150 (5 * 30) actual "relations" in postgresql lingo. So my max_fsm_relations setting can be 150? (Which seems a little low compared to what I see online in several posts online). Assuming 150 is ok, that manual page, and other tweaking stuff such as - http://www.revsys.com/writings/postgresql-performance.html -- suggest that "max_fsm_pages" is even more critical. The manual says this should be at least 16 times that of max_fsm_relations, so in my example, it should be at least 150 * 16, which is about 2400. This seems abysmally low! If I up this figure to, say, 24000 instead, I still keep seeing the kinds of errors posted above. My question no. 3 -- for a SELECT-heavy database, is there any tried-and-tested caching tool that could be of use? I'd like to skip connection pooling if possible, but would be very interested in good caching products or contribs. I noticed pgmemcached ( http://pgfoundry.org/projects/pgmemcache/ ) but it's in beta 1.2. Would love some thoughts from people who have used it...is it worth the effort? More of my info below. TIA for your thoughts and advice! -/Phoenix ==========EXHIBIT 1: PS OUTPUT ============== > ps auxnm | grep postgres 26 20665 0.0 0.0 11760 612 ? - Aug18 0:00 postgres: logger process 26 20670 0.0 1.1 188684 48312 ? - Aug18 0:00 postgres: writer process 26 20671 0.0 0.0 12032 804 ? - Aug18 0:28 postgres: stats collector process 26 14497 0.0 4.1 452108 172656 ? - 02:05 0:02 postgres: traders_traders traders 127.0.0.1(56204) VACUUM 0 9444 0.0 0.0 5008 656 pts/0 - 02:53 0:00 grep postgres ==========EXHIBIT 2: POSTGRES.CONF ============== listen_addresses = 'localhost,*' max_connections = 250 shared_buffers = 21000 # Not much more: http://snipr.com/pgperf effective_cache_size = 32000 max_fsm_relations = 500 max_fsm_pages = 60000 sort_mem = 4096 # Low when not needed: http://snipr.com/pgperf work_mem = 4096 temp_buffers = 4096 authentication_timeout = 10s ssl = off #VACUUM SETTINGS autovacuum = on vacuum_cost_delay = 10 stats_start_collector = on stats_row_level = on autovacuum_vacuum_threshold = 300 autovacuum_analyze_threshold = 100 #FOR BACKGROUND TASKS PERFORMANCE wal_buffers=64 checkpoint_segments=128 checkpoint_timeout=900 fsync = on maintenance_work_mem = 256MB # Too high? Well, watch for it... ========== EXHIBIT 3: VACUUM VERBOSE OUTPUT ======= mydbuser=# vacuum analyze verbose traders; INFO: vacuuming "public.traders" INFO: scanned index "traders_pkey" to remove 6 row versions DETAIL: CPU 0.07s/0.04u sec elapsed 140.61 sec. INFO: scanned index "idx_traders_userid" to remove 6 row versions DETAIL: CPU 0.05s/0.04u sec elapsed 49.70 sec. INFO: scanned index "idx_traders_mdate" to remove 6 row versions DETAIL: CPU 0.02s/0.04u sec elapsed 32.66 sec. INFO: scanned index "traders_unique_alias" to remove 6 row versions DETAIL: CPU 0.10s/0.11u sec elapsed 167.20 sec. INFO: "traders": removed 6 row versions in 5 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "traders_pkey" now contains 2780925 row versions in 22821 pages DETAIL: 6 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "idx_traders_userid" now contains 2780925 row versions in 11785 pages DETAIL: 6 index row versions were removed. 127 index pages have been deleted, 127 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "idx_traders_mdate" now contains 2780925 row versions in 7912 pages DETAIL: 6 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "traders_unique_alias" now contains 2780925 row versions in 9342 pages DETAIL: 6 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "traders": found 6 removable, 2780925 nonremovable row versions in 199396 pages DETAIL: 0 dead row versions cannot be removed yet. There were 2959732 unused item pointers. 137272 pages contain useful free space. 0 pages are entirely empty. CPU 0.74s/0.40u sec elapsed 1335.71 sec. WARNING: relation "public.traders" contains more than "max_fsm_pages" pages with useful free space HINT: Consider compacting this relation or increasing the configuration parameter "max_fsm_pages". INFO: vacuuming "pg_toast.pg_toast_41513" INFO: index "pg_toast_41513_index" now contains 26 row versions in 2 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_41513": found 0 removable, 26 nonremovable row versions in 5 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 4 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.01 sec. INFO: analyzing "public.traders" INFO: "traders": scanned 3000 of 199396 pages, containing 40775 live rows and 0 dead rows; 3000 rows in sample, 2710124 estimated total rows VACUUM Time: 1533601.235 ms ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend