Hello! I tried asking this over on the general listserv before realizing pgsql-performance is probably better suited.
Questions:
Hi all,
I recently started at a new firm and have been trying to help to grok certain planner behavior. A strip-down example of the sort of join we do in the database looks like this, wherein we join two tables that have about 1 million rows:
-- VACUUM (FULL, VERBOSE, ANALYZE), run the query twice first, then...
EXPLAIN(ANALYZE, VERBOSE, COSTS, SETTINGS, BUFFERS, WAL, TIMING, SUMMARY)
SELECT
ci.conversation_uuid,
ci.item_uuid,
ci.tenant_id,
it.item_subject,
it.item_body_start
FROM
conversation_item AS ci
INNER JOIN item_text AS it ON it.item_uuid = ci.item_uuid;
SELECT
ci.conversation_uuid,
ci.item_uuid,
ci.tenant_id,
it.item_subject,
it.item_body_start
FROM
conversation_item AS ci
INNER JOIN item_text AS it ON it.item_uuid = ci.item_uuid;
-- The necessary DDL that creates these tables and indexes is attached. I've commented out some extra stuff that isn't directly related to the above query.
Depending on config, we get different results in terms of performance (EXPLAIN output attached):
PLAN A (default config, effective cache size just shy of 15GB): 3.829 seconds. A nested loop is used to probe the hash index `conversation_item_item_hash_index` for each row of item_text. Although the cost of probing once is low, a fair amount of time passes because the operation is repeated ~1.3 million times.
PLAN B (enable_indexscan off, effective cache same as before): 3.254 seconds (~15% speedup, sometimes 30%). Both tables are scanned sequentially and conversation_item is hashed before results are combined with a hash join.
PLAN C: (random_page_cost = 8.0, instead of default 4, effective cache same as before): 2.959 (~23% speedup, sometimes 38%). Same overall plan as PLAN B, some differences in buffers and I/O. I'll note we had to get to 8.0 before we saw a change to planner behavior; 5.0, 6.0, and 7.0 were too low to make a difference.
Environment:
Postgres 15.2
Amazon RDS — db.m6g.2xlarge
- In Plan A, what factors (read: which GUC settings) are causing the planner to select a substantially slower plan despite having recent stats about number of rows?
- Is there a substantial difference between the on-the-fly hash done in Plan B and Plan C compared to the hash-index used in Plan A? Can I assume they are essentially the same? Perhaps there are there differences in how they're applied?
- Is it common to see values for random_page_cost set as high as 8.0? We would of course need to investigate whether we see a net positive or net negative impact on other queries, to adopt this as a general setting, but is it a proposal we should actually consider?
- Maybe we are barking up the wrong tree with the previous questions. Are there other configuration parameters we should consider first to improve performance in situations like the one illustrated?
- Are there other problems with our schema, query, or plans shown here? Other approaches (or tools/analyses) we should consider?
Attached also are the results of
SELECT name, current_setting(name), source FROM pg_settings WHERE source NOT IN ('default', 'override');
- Patrick
Hash Join (cost=53933.69..255802.87 rows=1395542 width=216) (actual time=422.466..2868.948 rows=1394633 loops=1)" Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id, it.item_subject, it.item_body_start" Inner Unique: true" Hash Cond: (it.item_uuid = ci.item_uuid)" Buffers: shared hit=12283 read=95974 dirtied=77, temp read=44113 written=44113" I/O Timings: shared/local read=462.195, temp read=86.413 write=423.363" WAL: records=1 bytes=58" -> Seq Scan on item_text it (cost=0.00..110641.18 rows=1401518 width=196) (actual time=0.012..854.842 rows=1394633 loops=1)" Output: it.item_subject, it.item_body_start, it.item_uuid" Buffers: shared hit=652 read=95974 dirtied=77" I/O Timings: shared/local read=462.195" WAL: records=1 bytes=58" -> Hash (cost=25586.42..25586.42 rows=1395542 width=36) (actual time=422.386..422.387 rows=1394633 loops=1)" Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id" Buckets: 131072 Batches: 16 Memory Usage: 6799kB" Buffers: shared hit=11631, temp written=8932" I/O Timings: temp write=78.785" -> Seq Scan on conversation_item ci (cost=0.00..25586.42 rows=1395542 width=36) (actual time=0.004..133.304 rows=1394633 loops=1)" Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id" Buffers: shared hit=11631" Settings: effective_cache_size = '16053152kB', jit = 'off', random_page_cost = '8'" Query Identifier: 5005100605804837348" Planning:" Buffers: shared hit=398 read=7" I/O Timings: shared/local read=1.400" Planning Time: 2.379 ms" Execution Time: 2956.264 ms" Generated with: EXPLAIN(ANALYZE, VERBOSE, COSTS, SETTINGS, BUFFERS, WAL, TIMING, SUMMARY) SELECT ci.conversation_uuid, ci.item_uuid, ci.tenant_id, it.item_subject, it.item_body_start FROM conversation_item AS ci INNER JOIN item_text AS it ON it.item_uuid = ci.item_uuid;
Hash Join (cost=53126.96..251632.49 rows=1374665 width=216) (actual time=475.472..3162.922 rows=1374687 loops=1)" Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id, it.item_subject, it.item_body_start" Inner Unique: true" Hash Cond: (it.item_uuid = ci.item_uuid)" Buffers: shared hit=1376 read=105087 written=8220, temp read=43290 written=43290" I/O Timings: shared/local read=319.560 write=38.895, temp read=92.529 write=420.118" -> Seq Scan on item_text it (cost=0.00..108786.21 rows=1378021 width=196) (actual time=0.012..730.051 rows=1374687 loops=1)" Output: it.item_subject, it.item_body_start, it.item_uuid" Buffers: shared hit=1238 read=93768 written=5951" I/O Timings: shared/local read=289.139 write=29.014" -> Hash (cost=25203.65..25203.65 rows=1374665 width=36) (actual time=475.372..475.373 rows=1374687 loops=1)" Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id" Buckets: 131072 Batches: 16 Memory Usage: 6718kB" Buffers: shared hit=138 read=11319 written=2269, temp written=8804" I/O Timings: shared/local read=30.421 write=9.881, temp write=80.814" -> Seq Scan on conversation_item ci (cost=0.00..25203.65 rows=1374665 width=36) (actual time=0.008..184.562 rows=1374687 loops=1)" Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id" Buffers: shared hit=138 read=11319 written=2269" I/O Timings: shared/local read=30.421 write=9.881" Settings: effective_cache_size = '16053152kB', enable_indexscan = 'off', jit = 'off'" Query Identifier: 5005100605804837348" Planning:" Buffers: shared hit=379 read=12" I/O Timings: shared/local read=0.810" Planning Time: 1.724 ms" Execution Time: 3254.486 ms" Generated with: EXPLAIN(ANALYZE, VERBOSE, COSTS, SETTINGS, BUFFERS, WAL, TIMING, SUMMARY) SELECT ci.conversation_uuid, ci.item_uuid, ci.tenant_id, it.item_subject, it.item_body_start FROM conversation_item AS ci INNER JOIN item_text AS it ON it.item_uuid = ci.item_uuid;
Nested Loop (cost=0.00..194705.88 rows=1340602 width=213) (actual time=0.018..3739.905 rows=1339642 loops=1)" Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id, it.item_subject, it.item_body_start" Inner Unique: true" Buffers: shared hit=2681522 read=90343 dirtied=203" I/O Timings: shared/local read=233.152" WAL: records=203 fpi=203 bytes=428058" -> Seq Scan on item_text it (cost=0.00..105892.07 rows=1350507 width=193) (actual time=0.011..628.038 rows=1339642 loops=1)" Output: it.item_subject, it.item_body_start, it.item_uuid" Buffers: shared hit=2044 read=90343 dirtied=193" I/O Timings: shared/local read=233.152" WAL: records=193 fpi=193 bytes=380966" -> Index Scan using conversation_item_item_hash_index on conversation_item ci (cost=0.00..0.07 rows=1 width=36) (actual time=0.002..0.002 rows=1 loops=1339642)" Output: ci.conversation_uuid, ci.item_uuid, ci.tenant_id" Index Cond: (ci.item_uuid = it.item_uuid)" Rows Removed by Index Recheck: 0" Buffers: shared hit=2679478 dirtied=10" WAL: records=10 fpi=10 bytes=47092" Settings: effective_cache_size = '16053152kB', jit = 'off'" Query Identifier: 5005100605804837348" Planning:" Buffers: shared hit=12 read=4" I/O Timings: shared/local read=0.015" Planning Time: 0.236 ms" Execution Time: 3829.002 ms" Generated with: EXPLAIN(ANALYZE, VERBOSE, COSTS, SETTINGS, BUFFERS, WAL, TIMING, SUMMARY) SELECT ci.conversation_uuid, ci.item_uuid, ci.tenant_id, it.item_subject, it.item_body_start FROM conversation_item AS ci INNER JOIN item_text AS it ON it.item_uuid = ci.item_uuid;
Attachment:
create_statements.postgre.sql
Description: application/sql
name,current_setting,source archive_library,rds_archive,configuration file archive_mode,on,configuration file archive_timeout,5min,configuration file autovacuum_analyze_scale_factor,0.05,configuration file autovacuum_max_workers,3,configuration file autovacuum_naptime,15s,configuration file autovacuum_vacuum_cost_limit,600,configuration file autovacuum_vacuum_scale_factor,0.1,configuration file autovacuum_work_mem,1003322kB,configuration file checkpoint_completion_target,0.9,configuration file compute_query_id,auto,configuration file default_toast_compression,lz4,configuration file effective_cache_size,16053152kB,configuration file fsync,on,configuration file full_page_writes,on,configuration file hot_standby,off,configuration file hot_standby_feedback,on,configuration file huge_pages,on,configuration file idle_in_transaction_session_timeout,1d,configuration file ignore_invalid_pages,off,configuration file jit,off,configuration file listen_addresses,*,command line lo_compat_privileges,off,configuration file log_autovacuum_min_duration,10s,configuration file log_checkpoints,on,configuration file log_destination,stderr,configuration file log_file_mode,644,configuration file log_line_prefix,%t:%r:%u@%d:[%p]:,configuration file log_min_duration_statement,4s,configuration file log_rotation_age,1h,configuration file log_temp_files,0,configuration file log_truncate_on_rotation,off,configuration file logging_collector,on,configuration file maintenance_work_mem,526332kB,configuration file max_connections,3449,configuration file max_locks_per_transaction,64,configuration file max_parallel_workers,8,configuration file max_prepared_transactions,0,configuration file max_replication_slots,20,configuration file max_stack_depth,6MB,configuration file max_standby_archive_delay,5min,configuration file max_standby_streaming_delay,5min,configuration file max_wal_senders,20,configuration file max_wal_size,2GB,configuration file max_worker_processes,16,configuration file min_wal_size,192MB,configuration file pg_stat_statements.track,all,configuration file rds.force_autovacuum_logging_level,warning,configuration file rds.internal_databases,"rdsadmin,template0",configuration file rds.logical_replication,off,configuration file rds.rds_superuser_reserved_connections,2,configuration file recovery_init_sync_method,syncfs,configuration file recovery_prefetch,off,configuration file remove_temp_files_after_crash,off,configuration file shared_buffers,8026576kB,configuration file ssl,on,configuration file ssl_ca_file,/rdsdbdata/rds-metadata/ca-cert.pem,configuration file ssl_cert_file,/rdsdbdata/rds-metadata/server-cert.pem,configuration file ssl_key_file,/rdsdbdata/rds-metadata/server-key.pem,configuration file superuser_reserved_connections,3,configuration file synchronous_commit,on,configuration file track_activity_query_size,4kB,configuration file track_functions,pl,configuration file track_io_timing,on,configuration file track_wal_io_timing,on,configuration file unix_socket_group,rdsdb,configuration file update_process_title,on,configuration file vacuum_cost_page_miss,5,configuration file vacuum_failsafe_age,1200000000,configuration file vacuum_multixact_failsafe_age,1200000000,configuration file wal_compression,zstd,configuration file wal_keep_size,2GB,configuration file wal_level,replica,configuration file wal_receiver_create_temp_slot,off,configuration file wal_receiver_timeout,30s,configuration file wal_sender_timeout,30s,configuration file