Hi I have been investigating a problem with one of our production databases where every few weeks the database suddenly came to a grinding halt. When this happened, queries needed much more time to complete and just opening a local db connection with psql could take up to 30 seconds. The PostgreSQL version is 9.2.13. The database has a central table (about 60GB including indexes) with records which can be searched via a web interface. The table is a materialized view that is updated via several custom triggers on other tables. When data is changed, all relevant record are DELETEd and then INSERTed from a view. The table is replicated via slony1 (2.1.4). All search queries are executed on a slave db. The databases run on Xeon servers with 24 cores and 96GB of ram. I managed to reproduce the problem using the following steps: - Perform an update on the master server which causes about 50.000 rows to be repeatedly deleted and inserted in a single transaction. For testing I did this 20 times for about a million deletes and inserts.(*) - Slony starts to replicate the individual deletes and inserts to the slave db. - After this has been running for a few minutes, I run a particular query(**) on the slave repeatedly in up to 20 psql connections. For each query a new connection is opened. - After a few seconds the backend processes go up to almost 100% cpu usage and take a long time to complete. When i run psql on the console it takes anywhere from 0s (almost immediately) to tens of seconds to get to the prompt. (The average time gets worse the longer the replication is running.) After doing some debugging it seems that the backend processes are spending most of their time in spinlocks in TransactionIdIsInProgress() trying to get a lock on ProcArrayLock. This function is called more often (when processing searches) the longer the replication transaction is running. TransactionIdIsInProgress() conveniently has some debug counters enabled via #defining XIDCACHE_DEBUG. When the backend processes processing the searches finish their transaction, the xc_by_main_xid counter is about the same for each search. The longer the slony replication transaction is running, the higher the number gets. Serious slowdowns start at about 100.000 and I had it get up to more than 2.000.000 (not sure whether that corresponds to the number of deleted / inserted tuples). Is this a known problem? Unfortunately, I did not yet manage to create a self contained example. Just copying the table to another database on the slave and deleting / inserting the tuples without slony did not reproduce the issue. I am not sure whether this is due to reduced bloat in the table after pg_dump / pg_restore or whether it is relevant that slony deletes and inserts the rows individually (or something entirely different). (*) On the production server a poor interaction between triggers and the way the application updates data caused the rows to be deleted / inserted many times. Fortunately, we could work around this so there should be no immediate danger that this comes up on the live db again. (**) The search query I used to reproduce the problem does not look particularly nasty and returns about 260 rows in a few hundred milliseconds under normal circumstances. It does no full table scan but uses a few indexes. The returned rows are among the rows that get deleted and inserted repeatedly. Regards, Julian v. Bock -- Julian v. Bock Projektleitung Software-Entwicklung OpenIT GmbH Tel +49 211 239 577-0 In der Steele 33a-41 Fax +49 211 239 577-10 D-40599 Düsseldorf http://www.openit.de ________________________________________________________________ HRB 38815 Amtsgericht Düsseldorf USt-Id DE 812951861 Geschäftsführer: Oliver Haakert, Maurice Kemmann -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general