On Fri, Nov 30, 2018 at 8:05 AM Chris Mair <chris@xxxxxxxx> wrote:
> We're kind of pulling out our hair here, any ideas?
You might try issuing the command
analyze;
right *before* the command that hangs.
You might consider trying the "auto_explain" module ( https://www.postgresql.org/docs/current/auto-explain.html ). This will let you "trap" the query plan used for the long-running query. Then compare that query plan with a manually run explain when it runs quickly to see if they differ. If they do, it suggests that bad statistics are a likely culprit and Chris' suggestion of running analyze will help.
You only need to analyze those tables used in the query and, most likely, only tables that were substantially changed within a moderately short period prior to the start of the query.
Autovacuum, which will handles analyze as well, typically defaults to checking for tables that need attention every minute so for processes that have a lot of steps it becomes "luck of the draw" whether or not a needed analyze is run after a substantial table change and before that table is used.
We frequently put specific "analyze" statements in such scripts immediately following bulk-update statements.
Cheers,
Steve