We have a interesting thing happening on one of our DB's that when autovacuum runs against the pg_catalog tables (or we run a manual vacuum) we get a large delay in opening new connections. If this happens during a busy period on our system we end up getting 200-300 connections trying to open at once. All the backend processes on the server for the connections show up in the ps list in a state of startup. As the connections don't actually open until the delay clears the maximum connections in the connection pool we have configured on our application servers is not getting hit causing the large number of connections to be attempted open. The performance of queries on the existing open connections are also effected (which is why more connections are being opened since the existing open ones are releasing back to the pool slower).
The database contains roughly 1300 schemas each with about 1350 objects. Each schema is used to store the data for a single client. As part of our demoing system we create 100 new schemas each day. And have the same number of schemas dropped each day due to expired demos.
On a test server when I manually run a vacuum on pg_catalog.pg_class and then attempt to open 20 connections from a remote client to the DB I get an average of 64 seconds to get the connections open with the longest taking 80 seconds
If I immediately run another vacuum of pg_class (with no drop or creation of any schemas) and then attempt to open 20 connections from the same client I get an average of 84 seconds to get the connections open with the longest taking 123 seconds. The behavior seems to not be 100% consistent however as some times the subsequent vacuum run doesn't cause the connections to be slow.
We pulled a core dump from one of the backend processes when it was in this state and got the following backtrace
#0 0x00000036e74d5d77 in semop () from /lib64/libc.so.6
#1 0x00000000005d42e3 in PGSemaphoreLock (sema=0x2b8ff150ec40, interruptOK=0 '\000') at pg_sema.c:418
#2 0x000000000060d796 in LWLockAcquire (lockid=FirstBufMappingLock, mode=LW_SHARED) at lwlock.c:464
#3 0x00000000005fada8 in BufferAlloc (smgr=0x47c7c10, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=680095, mode=RBM_NORMAL, strategy=0x47c3570, hit=0x7fff6dd6a987 "") at bufmgr.c:531
#4 ReadBuffer_common (smgr=0x47c7c10, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=680095, mode=RBM_NORMAL, strategy=0x47c3570, hit=0x7fff6dd6a987 "") at bufmgr.c:325
#5 0x00000000005fb4cd in ReadBufferExtended (reln=0x2b8ff54a9540, forkNum=MAIN_FORKNUM, blockNum=680095, mode=RBM_NORMAL, strategy=0x47c3570) at bufmgr.c:246
#6 0x00000000004676fd in heapgetpage (scan=0x47c2d50, page=680095) at heapam.c:223
#7 0x000000000046804e in heapgettup (scan=0x47c2d50, dir=<value optimized out>, nkeys=2, key=0x47c3680) at heapam.c:556
#8 0x00000000004685fc in heap_getnext (scan=0x47c2d50, direction=1842784336) at heapam.c:1345
#9 0x00000000006c1378 in RelationBuildTupleDesc (targetRelId=<value optimized out>, insertIt=1 '\001') at relcache.c:468
#10 RelationBuildDesc (targetRelId=<value optimized out>, insertIt=1 '\001') at relcache.c:882
#11 0x00000000006c2abe in RelationIdGetRelation (relationId=2610) at relcache.c:1568
#12 0x000000000046b5fb in relation_open (relationId=2610, lockmode=1842784336) at heapam.c:907
#13 0x000000000046b673 in heap_open (relationId=20840476, lockmode=1842784336) at heapam.c:1077
#14 0x00000000006bbd27 in SearchCatCache (cache=0x4777980, v1=2655, v2=0, v3=0, v4=<value optimized out>) at catcache.c:1183
#15 0x00000000006c03da in RelationInitIndexAccessInfo (relation=0x2b8ff54b4688) at relcache.c:996
#16 0x00000000006c1aa5 in RelationBuildDesc (targetRelId=<value optimized out>, insertIt=1 '\001') at relcache.c:904
#17 0x00000000006c1cfd in load_critical_index (indexoid=2655, heapoid=2603) at relcache.c:3080
#18 0x00000000006c3be2 in RelationCacheInitializePhase3 () at relcache.c:2890
#19 0x00000000006d7fcb in InitPostgres (in_dbname=0x47182d0 "aspdata", dboid=0, username=<value optimized out>, out_dbname=0x0) at postinit.c:816
#20 0x0000000000619f3d in PostgresMain (argc=<value optimized out>, argv=0x46f2a28, username=0x46f2950 "aspuser") at postgres.c:3650
#21 0x00000000005df87b in ServerLoop () at postmaster.c:3600
#22 0x00000000005e0497 in PostmasterMain (argc=5, argv=0x46f1c10) at postmaster.c:1115
#23 0x000000000058518e in main (argc=5, argv=<value optimized out>) at main.c:199
As a stop gap we've added 3 additional schema storage DB's and are rotating the creations between them to reduce the number of schemas in the single DB that we had (also reducing the removal and addition of schemas to that DB). To reduce the impact of the vacuum on what used to be our single DB. But this is only gaining us some time as if the vacuum hits us at a busy time for requests we will get the same backup of 200-300 connections in the startup state.
Any one have any suggestions as to how to resolve this?
The documentation suggests that running the auto vacuum more often can help with performance issues but with the results I'm seeing on our test box it seems like that doing that is just going to cause more issues not less.
For our setup we're running postgres 9.1.1 compiled from source on Centos 5.8 x64 (Dual Xeon x5650 with 32 gigs of ram)
postgres=# select version();
version
---------------------------------------------------------------------------------------------------------------
PostgreSQL 9.1.1 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-52), 64-bit
(1 row)
Our autovacuum settings are currently the defaults
autovacuum | on | Starts the autovacuum subprocess.
autovacuum_analyze_scale_factor | 0.1 | Number of tuple inserts, updates or deletes prior to analyze as a fraction of reltuples.
autovacuum_analyze_threshold | 50 | Minimum number of tuple inserts, updates or deletes prior to analyze.
autovacuum_freeze_max_age | 200000000 | Age at which to autovacuum a table to prevent transaction ID wraparound.
autovacuum_max_workers | 3 | Sets the maximum number of simultaneously running autovacuum worker processes.
autovacuum_naptime | 1min | Time to sleep between autovacuum runs.
autovacuum_vacuum_cost_delay | 20ms | Vacuum cost delay in milliseconds, for autovacuum.
autovacuum_vacuum_cost_limit | -1 | Vacuum cost amount available before napping, for autovacuum.
autovacuum_vacuum_scale_factor | 0.2 | Number of tuple updates or deletes prior to vacuum as a fraction of reltuples.
autovacuum_vacuum_threshold | 50 | Minimum number of tuple updates or deletes prior to vacuum.
Thanks
--Mike