Re: hashjoins, index loops to retrieve pk/ux constrains in pg12

Arturas Mazeika <mazeika@xxxxxxxxx> · Mon, 27 Sep 2021 21:56:12 +0200

IHi Michael,

Thanks a lot for having a look at the query once again in more detail. In short, you are right, I fired the liquibase scripts and observed the exact query that was hanging in pg_stats_activity. The query was:

SELECT 
	FK.TABLE_NAME       as "TABLE_NAME"
	, CU.COLUMN_NAME    as "COLUMN_NAME"
	, PK.TABLE_NAME     as "REFERENCED_TABLE_NAME"
	, PT.COLUMN_NAME    as "REFERENCED_COLUMN_NAME"
	, C.CONSTRAINT_NAME as "CONSTRAINT_NAME" 
FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS C 
INNER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS FK ON C.CONSTRAINT_NAME = FK.CONSTRAINT_NAME 
INNER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS PK ON C.UNIQUE_CONSTRAINT_NAME = PK.CONSTRAINT_NAME 
INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE CU ON C.CONSTRAINT_NAME = CU.CONSTRAINT_NAME 
INNER JOIN (   
	SELECT      
		i1.TABLE_NAME
		, i2.COLUMN_NAME
		FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS i1   
		INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE i2 ON i1.CONSTRAINT_NAME = i2.CONSTRAINT_NAME
		WHERE i1.CONSTRAINT_TYPE = 'PRIMARY KEY' 
) PT ON PT.TABLE_NAME = PK.TABLE_NAME WHERE      
lower(FK.TABLE_NAME)='secrole_condcollection'

I rerun this query twice. Once with 
set enable_hashjoin = false; and 
set enable_hashjoin = true;

. I observed that the join order was very, very similar between the hash and index plans. I reran the above two queries with 
random_page_cost to 2, 1.5, or 1.0 and observed no difference whatsoever, the planner was always choosing the hashjoins over sort/index nested loops. the 
seq_page_cost is set to default value 1. The tables behind the views do not have more than 10K rows, and do not exceed 400KB of space. The work_mem parameter is set to 256MB, effective cache is 9GB, the machine has something around 32-64GB of RAM, SSD as the primary drive, 140 default connections. The query planner, of course thinks that the overall nested loop including hashes is better:

cost=2174.36..13670.47 (hash)

vs 

cost=1736.10..18890.44 (index/sort join) 

but I think there's a problem there, cause I don't think that one can reuse the pre-computed hashes over and over again, while sort/index joins end up hitting the same buffers, or am I wrong?

More details about the query plans as well as the complete set of settings can be found in the original email at https://www.postgresql.org/message-id/CAAUL%3DcFcvUo%3D7b4T-K5PqiqrF6etp59qcgv77DyK2Swa4VhYuQ%40mail.gmail.com 

If you could have another look into what's going on, I'd appreciate it a lot. in postgres 9.6 our setup goes through the liquibase scripts in 5 minutes, and pg12 with hash joins may take up to 1.5 hours.

Cheers,
Arturas

On Mon, Sep 27, 2021 at 4:12 PM Michael Lewis <mlewis@xxxxxxxxxxx> wrote:
I'm unclear what you changed to get the planner to choose one vs the other. Did you disable hashjoins? Without the full plan to review, it is tough to agre with any conclusion that these particular nodes are troublesome. It might be that this was the right choice for that part of that plan, but improper estimates at a earlier step were problematic.

What configs have you changed such as work_mem, random_page_cost, and such? If random_page_cost & seq_page_cost are still default values, then the planner will tend to do more seq scans I believe, and hash them to join with large sets of data, rather than do nested loop index scans. I think that's how that works. With the lack of flexibility to change the query, you might be able to set a few configs for the user that runs these schema checks. If you can find changes that make an overall improvement.

Michael Lewis  |  Database Engineer
Entrata