Hi,
we faced a performance issue when joining 2 partitioned tables
(declarative partitioning). The planner chooses nested loop while we
expect hash join.
The query and the plan are available here: https://explain.depesz.com/s/23r9
table_1 and table_2 are hash partitioned using volume_id column. Usually
we make analyze on partitions. We do not make analyze on the partitioned
table (parent).
However, if we run 'analyze' on the partitioned table then planner
starts choosing hash join. As a comparison, the execution using nested
loop takes about 15 minutes and if it is done using hash join then the
query lasts for about 1 minute. When running 'analyze' for the
partitioned table, postgres inserts statistics for the partitioned table
into pg_stats (pg_statistics). Before that, there are only statistics
for partitions. We suspect that this is the reason for selecting bad
query plan.
The query is executed with cursor thus, in order to avoid parallel
query, I set max_parallel_workers_per_gather to 0 during tests.
We found that a similar issue was discussed in the context of
inheritance:
https://www.postgresql.org/message-id/Pine.BSO.4.64.0904161836540.11937%40leary.csoft.net
and the conclusion was to add the following paragraph to the 'analyze' doc:
> If the table being analyzed has one or more children, ANALYZE will
gather statistics twice: once on the rows of the parent table only, and
a second time on the rows of the parent table with all of its children.
This second set of statistics is needed when planning queries that
traverse the entire inheritance tree. The autovacuum daemon, however,
will only consider inserts or updates on the parent table itself when
deciding whether to trigger an automatic analyze for that table. If that
table is rarely inserted into or updated, the inheritance statistics
will not be up to date unless you run ANALYZE manually.
(https://www.postgresql.org/docs/13/sql-analyze.html)
I would appreciate if anyone could shed some light on the following
questions:
1) Is this above paragraph from docs still valid in PG 13 and does it
apply to declarative partitioning as well? Is running analyze manually
on a partitioned table needed to get proper plans for queries on
partitioned tables? Partitioned table (in the declarative way) is
”virtual” and does not keep any data so it seems that there are no
statistics that can be gathered from the table itself and statistics
from partitions should be sufficient.
2) Why does the planner need these statistics since they seem to be
unused in the query plan. The query plan uses only partitions, not the
partitioned table.
PostgreSQL version number:
version
---------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 13.3 (Ubuntu 13.3-1.pgdg16.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, 64-bit
(1 row)
How you installed PostgreSQL: From Ubuntu 16 repositories.
Changes made to the settings in the postgresql.conf file:
name | current_setting
| source
-------------------------------------+-----------------------------------------+----------------------
application_name | psql
| client
auto_explain.log_analyze | on
| configuration file
auto_explain.log_min_duration | 30s
| configuration file
auto_explain.log_nested_statements | on
| configuration file
auto_explain.log_timing | off
| configuration file
autovacuum_freeze_max_age | 1000000000
| configuration file
autovacuum_max_workers | 6
| configuration file
autovacuum_vacuum_cost_delay | 20ms
| configuration file
autovacuum_vacuum_cost_limit | 2000
| configuration file
checkpoint_completion_target | 0.9
| configuration file
checkpoint_timeout | 15min
| configuration file
cluster_name | 13/main
| configuration file
cpu_index_tuple_cost | 0.001
| configuration file
cpu_operator_cost | 0.0005
| configuration file
cursor_tuple_fraction | 1
| configuration file
DateStyle | ISO, MDY
| configuration file
default_statistics_target | 200
| configuration file
default_text_search_config | pg_catalog.english
| configuration file
dynamic_shared_memory_type | posix
| configuration file
effective_cache_size | 193385MB
| configuration file
effective_io_concurrency | 1000
| configuration file
external_pid_file | /var/run/postgresql/13-main.pid
| configuration file
from_collapse_limit | 15
| configuration file
geqo_threshold | 15
| configuration file
idle_in_transaction_session_timeout | 1h
| configuration file
jit_above_cost | -1
| configuration file
jit_inline_above_cost | -1
| configuration file
jit_optimize_above_cost | -1
| configuration file
join_collapse_limit | 15
| configuration file
lc_messages | en_US.UTF-8
| configuration file
lc_monetary | en_US.UTF-8
| configuration file
lc_numeric | en_US.UTF-8
| configuration file
lc_time | en_US.UTF-8
| configuration file
log_autovacuum_min_duration | 1min
| configuration file
log_checkpoints | on
| configuration file
log_connections | on
| configuration file
log_destination | stderr
| configuration file
log_directory | pg_log
| configuration file
log_disconnections | on
| configuration file
log_filename | postgresql-%Y-%m-%d_%H%M%S.log
| configuration file
log_line_prefix | %t [%p-%l] app=%a %q%u@%d
| configuration file
log_lock_waits | on
| configuration file
log_min_duration_statement | 3s
| configuration file
log_rotation_age | 1d
| configuration file
log_rotation_size | 1GB
| configuration file
log_temp_files | 0
| configuration file
log_timezone | America/New_York
| configuration file
logging_collector | on
| configuration file
maintenance_work_mem | 2GB
| configuration file
max_connections | 1000
| configuration file
max_locks_per_transaction | 1280
| configuration file
max_parallel_workers_per_gather | 6
| configuration file
max_stack_depth | 2MB
| environment variable
max_wal_size | 10GB
| configuration file
max_worker_processes | 26
| configuration file
min_wal_size | 1GB
| configuration file
pg_stat_statements.max | 2000
| configuration file
pg_stat_statements.track | all
| configuration file
pg_stat_statements.track_planning | off
| configuration file
port | 5433
| configuration file
random_page_cost | 1.5
| configuration file
shared_buffers | 8GB
| configuration file
shared_preload_libraries | pg_stat_statements,auto_explain
| configuration file
ssl | on
| configuration file
ssl_cert_file |
/etc/ssl/certs/ssl-cert-snakeoil.pem | configuration file
ssl_key_file |
/etc/ssl/private/ssl-cert-snakeoil.key | configuration file
stats_temp_directory |
/var/run/postgresql/13-main.pg_stat_tmp | configuration file
temp_buffers | 2GB
| configuration file
TimeZone | America/New_York
| configuration file
track_commit_timestamp | on
| configuration file
track_io_timing | on
| configuration file
unix_socket_directories | /var/run/postgresql
| configuration file
vacuum_freeze_table_age | 1000000000
| configuration file
wal_buffers | 128MB
| configuration file
work_mem | 758MB
| configuration file
(75 rows)
Operating system and version: Linux r730server 4.15.0-142-generic
#146~16.04.1-Ubuntu SMP Tue Apr 13 09:27:15 UTC 2021 x86_64 x86_64
x86_64 GNU/Linux
What program you're using to connect to PostgreSQL: psql
Is there anything relevant or unusual in the PostgreSQL server logs?: No
--
Best Regards
Kamil Frydel