On Mon, 12 Jul 2021 at 23:16, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
The backtraces you captured look like the query is not "hung", it's
just computing away.
He mentioned earlier that the query was hung as 'active' for 8 hours and on.
incase this is due to bad plan,@Jurrie Overgoor is it also possible for you to run manually
`vacuumdb -a -v` from the terminal, each time before you run your test suite for some runs, do you still get the same issue?
I have a feeling repeated runs may have caused a lot of bloat on some tables which might have not been reclaimed by autovacuum runs.
I configured Jenkins to run that command prior to executing the tests. I got 5 successful runs, no hanging queries. Then I reverted and ran again. The first and second run were ok; the third run hung again. So your hunch might be right.
You might need
to investigate by altering your application to capture "EXPLAIN ..."
output just before the troublesome query, so you can see if it gets
a different plan in the slow cases.
Then I tried this. The query plans are indeed not consistent.
Most of the time the first line of the
query plan is: Unique (cost=4892.35..4892.35 rows=1 width=64)
[1]
I have seen other costs: 5818.30, 6350.85
and 6514.73. They all complete correctly. [2], [3], [4]
The plan that leaves the query hanging in
the 'active' state starts with: Unique (cost=241.81..241.82
rows=1 width=64) [5]
That's clearly much lower than the rest.
So I suspect the planner making a 'wrong' guess there, causing a
bad plan, and a long time to execute. For reference, the
executed query is [6].
Now, where to go from here? Is this considered a bug in PostgreSQL, or am I misusing the database engine by doing DROP DATABASE and CREATE DATABASE over and over again? I must say that I never saw this behavior on PostgreSQL 9.6, so in that regard it might be considered a bug.....?
What can I do to get to the bottom of this? Should I export the content of some metadata tables prior to executing the hanging query? Should I `vacuumdb -a -v` prior to logging the EXPLAIN for the hanging query?
With kind regards,
Jurrie
[1]
https://jurr.org/PostgreSQL_13_hanging_query/normal.txt
[2] https://jurr.org/PostgreSQL_13_hanging_query/alt1.txt
[3] https://jurr.org/PostgreSQL_13_hanging_query/alt3.txt
[4] https://jurr.org/PostgreSQL_13_hanging_query/alt2.txt
[5] https://jurr.org/PostgreSQL_13_hanging_query/hang.txt
[6] https://jurr.org/PostgreSQL_13_hanging_query/query.txt