On Mon, Aug 19, 2013 at 10:44 PM, Rural Hunter <ruralhunter@xxxxxxxxx> wrote:
-- Hi,
I'm on 9.2.4 with Ubuntu server. There are usually hundereds of connections doing the same insert with different data from different networks every minute, through pgbouncer in the same network of the database server. The database has been running for about one year without problem. Yesterday I got a problem that the connection count limit of the database server is reached. I checked the connections and found that there are many inserts hanging there. I checked the load(cpu,memory,io) of the db server but seems everything is fine. I also checked pg log and I only found there are one "incomplete message from client" error message every several minute.
It may not be related, it can be some kind of monitoring tool checking if PostgreSQL is listening on 5432 (or whatever) port. Do you have it?
The I recycled pgbouncer and kept monitoring the connections. I found the majority of the inserts finish quickly but every minute there are several inserts left and seems hanging there . So after a while, the connection limit is reached again. Besides those inserts, there are no other long run queries and auto vacuums. I also checked the locks of the inserts and found they were all granted. The insert statement itself is very simple and it only inserts one row but there are some triggers involved. They might impact the performance but I have never experience any since the majority of the inserts are fine.
I would check this triggers first. If you execute (by hand) the same insert (perhaps inside a transaction, followed by a rollback) does it hangs? If so, you can try to trace what these triggers are doing, perhaps the easier/faster way would be the old and good RAISE NOTICE (if it is PL/pgSQL). Or even, try to execute the trigger's source by hand, if it is not really huge; a EXPLAIN ANALYZE of the queries inside it may help.
I already have problems with a system were some UPDATEs suddenly started hungging (like your case), and it was really an SELECT inside a trigger that was with bad plans (some adjustment on ANALYZE parameters for one table helped in the case).
The problem persisted about 1-2 hours. I didn't do anything except recycling pgbouncer a few times. After that period, everything goes back to normal. It's has been 24 hours and it didn't happen again.
>From the error message in pg log, I supect it might be the network problem from some clients. Could anyone point out if there are other possible causes? I'm also wondering what those inserts are doing actually when they are hanging there, such as if they are in the trigger or not. Anything I can get similar with the connection snapshots in db2?
Regards,
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres