Evan Martin <postgresql2@xxxxxxxxxxxxxxxxx> writes: > I have an application that imports a lot of data and the does some > queries on it to build some caches in the database, all in one long > transaction. One of those cache updates repeatedly calls a plpgsql > function, which internally does some SQL queries. Sometimes this is > much, much slower than usual: 3-7 hours instead of 12-15 minutes. It was > totally reproducible when it happened, though (running on the same > machine, same input data). > It turns out that the problem only happens when the "log_lock_waits" > setting was OFF! Many machines had it ON (to troubleshoot a different > problem), so they never experienced it. > I eventually tracked it down to the query plan chosen for one particular > query in the plpgsql function: using a Nested Loop makes it fast and > using a Hash Join makes it very slow. Running an ANALYZE on one of the > tables involved fixes the problem - the fast query plan is chosen all > the time. This itself is a bit strange, because I was already running > ANALYZE on all tables after the data import - it seems that I needed to > run it a second time? Are you using the problematic function earlier in the process? I wonder if it's cached a bad plan that dates from when there was much less data in the table. I also wonder if maybe the specific table is being updated after the ANALYZEs. In the situation you've got here, you can't expect any help from auto-analyze; only your own manual ANALYZEs are going to be able to see the uncommitted data in the tables. > But what I'd really like to understand is: why did > setting log_lock_waits to ON always change the query plan to use a > Nested Loop? It's just not something I'd ever expect to affect a query plan. TBH, I don't believe it. There are a lot of moving parts here, but I don't see how that could be relevant. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general