I
Hi Michael,
Thanks a lot for having a look at the query once again in more detail. In short, you are right, I fired the liquibase scripts and observed the exact query that was hanging in pg_stats_activity. The query was:
SELECT FK.TABLE_NAME as "TABLE_NAME" , CU.COLUMN_NAME as "COLUMN_NAME" , PK.TABLE_NAME as "REFERENCED_TABLE_NAME" , PT.COLUMN_NAME as "REFERENCED_COLUMN_NAME" , C.CONSTRAINT_NAME as "CONSTRAINT_NAME" FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS C INNER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS FK ON C.CONSTRAINT_NAME = FK.CONSTRAINT_NAME INNER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS PK ON C.UNIQUE_CONSTRAINT_NAME = PK.CONSTRAINT_NAME INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE CU ON C.CONSTRAINT_NAME = CU.CONSTRAINT_NAME INNER JOIN ( SELECT i1.TABLE_NAME , i2.COLUMN_NAME FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS i1 INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE i2 ON i1.CONSTRAINT_NAME = i2.CONSTRAINT_NAME WHERE i1.CONSTRAINT_TYPE = 'PRIMARY KEY' ) PT ON PT.TABLE_NAME = PK.TABLE_NAME WHERE lower(FK.TABLE_NAME)='secrole_condcollection'
I rerun this query twice. Once with
set enable_hashjoin = false; and
set enable_hashjoin = true;
. I observed that the join order was very, very similar between the hash and index plans. I reran the above two queries with
random_page_cost to 2, 1.5, or 1.0 and observed no difference whatsoever, the planner was always choosing the hashjoins over sort/index nested loops. the
seq_page_cost is set to default value 1. The tables behind the views do not have more than 10K rows, and do not exceed 400KB of space. The work_mem parameter is set to 256MB, effective cache is 9GB, the machine has something around 32-64GB of RAM, SSD as the primary drive, 140 default connections. The query planner, of course thinks that the overall nested loop including hashes is better:
cost=2174.36..13670.47 (hash)
vs
cost=1736.10..18890.44 (index/sort join)
but I think there's a problem there, cause I don't think that one can reuse the pre-computed hashes over and over again, while sort/index joins end up hitting the same buffers, or am I wrong?
More details about the query plans as well as the complete set of settings can be found in the original email at https://www.postgresql.org/message-id/CAAUL%3DcFcvUo%3D7b4T-K5PqiqrF6etp59qcgv77DyK2Swa4VhYuQ%40mail.gmail.com
If you could have another look into what's going on, I'd appreciate it a lot. in postgres 9.6 our setup goes through the liquibase scripts in 5 minutes, and pg12 with hash joins may take up to 1.5 hours.
Cheers,
Arturas
On Mon, Sep 27, 2021 at 4:12 PM Michael Lewis <mlewis@xxxxxxxxxxx> wrote:
I'm unclear what you changed to get the planner to choose one vs the other. Did you disable hashjoins? Without the full plan to review, it is tough to agre with any conclusion that these particular nodes are troublesome. It might be that this was the right choice for that part of that plan, but improper estimates at a earlier step were problematic.
What configs have you changed such as work_mem, random_page_cost, and such? If random_page_cost & seq_page_cost are still default values, then the planner will tend to do more seq scans I believe, and hash them to join with large sets of data, rather than do nested loop index scans. I think that's how that works. With the lack of flexibility to change the query, you might be able to set a few configs for the user that runs these schema checks. If you can find changes that make an overall improvement.Michael Lewis | Database EngineerEntrata