Why would log_lock_waits affect a query plan?

Evan Martin <postgresql2@xxxxxxxxxxxxxxxxx> · Wed, 19 Jul 2017 22:43:03 +0200

I have an application that imports a lot of data and the does some 
queries on it to build some caches in the database, all in one long 
transaction. One of those cache updates repeatedly calls a plpgsql 
function, which internally does some SQL queries. Sometimes this is 
much, much slower than usual: 3-7 hours instead of 12-15 minutes. It was 
totally reproducible when it happened, though (running on the same 
machine, same input data).

It turns out that the problem only happens when the "log_lock_waits" 
setting was OFF! Many machines had it ON (to troubleshoot a different 
problem), so they never experienced it.

I eventually tracked it down to the query plan chosen for one particular 
query in the plpgsql function: using a Nested Loop makes it fast and 
using a Hash Join makes it very slow. Running an ANALYZE on one of the 
tables involved fixes the problem - the fast query plan is chosen all 
the time. This itself is a bit strange, because I was already running 
ANALYZE on all tables after the data import - it seems that I needed to 
run it a second time? But what I'd really like to understand is: why did 
setting log_lock_waits to ON always change the query plan to use a 
Nested Loop? It's just not something I'd ever expect to affect a query plan.

By the way, I also found that the problem does not occur if I commit 
before the cache updates. This was with PostgreSQL 9.6.3 running on 
Windows x64, if that matters.

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general