Helping planner to chose sequential scan when it improves performance

"Patrick O'Toole" <patrick.otoole@xxxxxxxxx> · Tue, 13 Jun 2023 13:24:51 -0600

Hi all,

I recently started at a new firm and have been trying to help to grok certain planner behavior. A strip-down example of the sort of join we do in the database looks like this, wherein we join two tables that have about 1 million rows:

-- VACUUM (FULL, VERBOSE, ANALYZE), run the query twice first, then...
EXPLAIN(ANALYZE, VERBOSE, COSTS, SETTINGS, BUFFERS, WAL, TIMING, SUMMARY)
SELECT
    ci.conversation_uuid,
    ci.item_uuid,
    ci.tenant_id,
    it.item_subject,
    it.item_body_start
FROM
    conversation_item AS ci
	INNER JOIN item_text AS it ON it.item_uuid = ci.item_uuid;

-- The necessary DDL that creates these tables and indexes is attached. I've commented out some extra stuff that isn't directly related to the above query.

Depending on config, we get different results in terms of performance (EXPLAIN output attached):

PLAN A (default config, effective cache size just shy of 15GB): 3.829 seconds. A nested loop is used to probe the hash index `conversation_item_item_hash_index` for each row of item_text. Although the cost of probing once is low, a fair amount of time passes because the operation is repeated ~1.3 million times.

PLAN B (enable_indexscan off, effective cache same as before): 3.254 seconds (~15% speedup, sometimes 30%). Both tables are scanned sequentially and conversation_item is hashed before results are combined with a hash join.

PLAN C: (random_page_cost = 8.0, instead of default 4, effective cache same as before): 2.959 (~23% speedup, sometimes 38%). Same overall plan as PLAN B, some differences in buffers and I/O. I'll note we had to get to 8.0 before we saw a change to planner behavior; 5.0, 6.0, and 7.0 were too low to make a difference.

Environment:

Postgres 15.2
Amazon RDS — db.m6g.2xlarge

Questions:
In Plan A, what factors are causing the planner to select a substantially slower plan despite having recent stats about number of rows?
Is there a substantial difference between the on-the-fly hash done in Plan B and Plan C compared to the hash-index used in Plan A? Can I assume they are essentially the same? Perhaps there are there differences in how they're applied?
Is it common to see values for random_page_cost set as high as 8.0? We would of course need to investigate whether we see a net positive or net negative impact on other queries, to adopt this as a general setting, but is it a proposal we should actually consider?
Maybe we are barking up the wrong tree with the previous questions. Are there other configuration parameters we should consider first to improve performance in situations like the one illustrated?
Are there other problems with our schema, query, or plans shown here? Other approaches (or tools/analyses) we should consider?
Hash Join  (cost=53933.69..255802.87 rows=1395542 width=216) (actual time=422.466..2868.948 rows=1394633 loops=1)"
  Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id, it.item_subject, it.item_body_start"
  Inner Unique: true"
  Hash Cond: (it.item_uuid = ci.item_uuid)"
  Buffers: shared hit=12283 read=95974 dirtied=77, temp read=44113 written=44113"
  I/O Timings: shared/local read=462.195, temp read=86.413 write=423.363"
  WAL: records=1 bytes=58"
  ->  Seq Scan on item_text it  (cost=0.00..110641.18 rows=1401518 width=196) (actual time=0.012..854.842 rows=1394633 loops=1)"
        Output: it.item_subject, it.item_body_start, it.item_uuid"
        Buffers: shared hit=652 read=95974 dirtied=77"
        I/O Timings: shared/local read=462.195"
        WAL: records=1 bytes=58"
  ->  Hash  (cost=25586.42..25586.42 rows=1395542 width=36) (actual time=422.386..422.387 rows=1394633 loops=1)"
        Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id"
        Buckets: 131072  Batches: 16  Memory Usage: 6799kB"
        Buffers: shared hit=11631, temp written=8932"
        I/O Timings: temp write=78.785"
        ->  Seq Scan on conversation_item ci  (cost=0.00..25586.42 rows=1395542 width=36) (actual time=0.004..133.304 rows=1394633 loops=1)"
              Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id"
              Buffers: shared hit=11631"
Settings: effective_cache_size = '16053152kB', jit = 'off', random_page_cost = '8'"
Query Identifier: 5005100605804837348"
Planning:"
  Buffers: shared hit=398 read=7"
  I/O Timings: shared/local read=1.400"
Planning Time: 2.379 ms"
Execution Time: 2956.264 ms"

Generated with:
	EXPLAIN(ANALYZE, VERBOSE, COSTS, SETTINGS, BUFFERS, WAL, TIMING, SUMMARY)
	SELECT
		ci.conversation_uuid,
		ci.item_uuid,
		ci.tenant_id,
		it.item_subject,
		it.item_body_start
	FROM
		conversation_item AS ci
		INNER JOIN item_text AS it ON it.item_uuid = ci.item_uuid;
Hash Join  (cost=53126.96..251632.49 rows=1374665 width=216) (actual time=475.472..3162.922 rows=1374687 loops=1)"
  Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id, it.item_subject, it.item_body_start"
  Inner Unique: true"
  Hash Cond: (it.item_uuid = ci.item_uuid)"
  Buffers: shared hit=1376 read=105087 written=8220, temp read=43290 written=43290"
  I/O Timings: shared/local read=319.560 write=38.895, temp read=92.529 write=420.118"
  ->  Seq Scan on item_text it  (cost=0.00..108786.21 rows=1378021 width=196) (actual time=0.012..730.051 rows=1374687 loops=1)"
        Output: it.item_subject, it.item_body_start, it.item_uuid"
        Buffers: shared hit=1238 read=93768 written=5951"
        I/O Timings: shared/local read=289.139 write=29.014"
  ->  Hash  (cost=25203.65..25203.65 rows=1374665 width=36) (actual time=475.372..475.373 rows=1374687 loops=1)"
        Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id"
        Buckets: 131072  Batches: 16  Memory Usage: 6718kB"
        Buffers: shared hit=138 read=11319 written=2269, temp written=8804"
        I/O Timings: shared/local read=30.421 write=9.881, temp write=80.814"
        ->  Seq Scan on conversation_item ci  (cost=0.00..25203.65 rows=1374665 width=36) (actual time=0.008..184.562 rows=1374687 loops=1)"
              Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id"
              Buffers: shared hit=138 read=11319 written=2269"
              I/O Timings: shared/local read=30.421 write=9.881"
Settings: effective_cache_size = '16053152kB', enable_indexscan = 'off', jit = 'off'"
Query Identifier: 5005100605804837348"
Planning:"
  Buffers: shared hit=379 read=12"
  I/O Timings: shared/local read=0.810"
Planning Time: 1.724 ms"
Execution Time: 3254.486 ms"

Generated with:
	EXPLAIN(ANALYZE, VERBOSE, COSTS, SETTINGS, BUFFERS, WAL, TIMING, SUMMARY)
	SELECT
		ci.conversation_uuid,
		ci.item_uuid,
		ci.tenant_id,
		it.item_subject,
		it.item_body_start
	FROM
		conversation_item AS ci
		INNER JOIN item_text AS it ON it.item_uuid = ci.item_uuid;
Nested Loop  (cost=0.00..194705.88 rows=1340602 width=213) (actual time=0.018..3739.905 rows=1339642 loops=1)"
  Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id, it.item_subject, it.item_body_start"
  Inner Unique: true"
  Buffers: shared hit=2681522 read=90343 dirtied=203"
  I/O Timings: shared/local read=233.152"
  WAL: records=203 fpi=203 bytes=428058"
  ->  Seq Scan on item_text it  (cost=0.00..105892.07 rows=1350507 width=193) (actual time=0.011..628.038 rows=1339642 loops=1)"
        Output: it.item_subject, it.item_body_start, it.item_uuid"
        Buffers: shared hit=2044 read=90343 dirtied=193"
        I/O Timings: shared/local read=233.152"
        WAL: records=193 fpi=193 bytes=380966"
  ->  Index Scan using conversation_item_item_hash_index on conversation_item ci  (cost=0.00..0.07 rows=1 width=36) (actual time=0.002..0.002 rows=1 loops=1339642)"
        Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id"
        Index Cond: (ci.item_uuid = it.item_uuid)"
        Rows Removed by Index Recheck: 0"
        Buffers: shared hit=2679478 dirtied=10"
        WAL: records=10 fpi=10 bytes=47092"
Settings: effective_cache_size = '16053152kB', jit = 'off'"
Query Identifier: 5005100605804837348"
Planning:"
  Buffers: shared hit=12 read=4"
  I/O Timings: shared/local read=0.015"
Planning Time: 0.236 ms"
Execution Time: 3829.002 ms"

Generated with:
	EXPLAIN(ANALYZE, VERBOSE, COSTS, SETTINGS, BUFFERS, WAL, TIMING, SUMMARY)
	SELECT
		ci.conversation_uuid,
		ci.item_uuid,
		ci.tenant_id,
		it.item_subject,
		it.item_body_start
	FROM
		conversation_item AS ci
		INNER JOIN item_text AS it ON it.item_uuid = ci.item_uuid;

Attachment:
create_statements.postgre.sql

Description: Binary data