Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]

"Seamus Abshere" <sabshere@xxxxxxxxxxxxxxxxxxxx> · Sat, 13 Feb 2021 19:09:11 -0500

hi,

I've traced this back to the formula for Parallel Append workers - log2(partitions).

The comment from Robert says: (src/backend/optimizer/path/allpaths.c)

		/*
		 * If the use of parallel append is permitted, always request at least
		 * log2(# of children) workers. 

In my case, every partition takes 1 second to scan, I have 64 cores, I have 64 partitions, and the wall time is 8 seconds with 8 workers.

I assume that if it it planned significantly more workers (16? 32? even 64?), it would get significantly faster (even accounting for transaction cost). So why doesn't it ask for more? Note that I've set max_parallel_workers=512, etc. (postgresql.conf in my first message).

Here are full plans https://gist.github.com/seamusabshere/1c1d869ee24145689f45e4124d8730d3

Best,
Seamus

On Fri, Feb 12, 2021, at 4:50 PM, Seamus Abshere wrote:
> hi,
> 
> How can I convince Postgres to use more than 8 cores?
> 
> I've got an r6gd.16xlarge with 64 vCpus and 512gb RAM running Postgres 
> 13.1 on Ubuntu 20.04.
> 
> CREATE TABLE tbl (
>   [...]
> ) PARTITION BY HASH (address_key);
> 
> It has 64 partitions - e.g. FOR VALUES WITH (modulus 64, remainder 0) etc.
> 
> We're running `SELECT COUNT(*) FROM tbl`.
> 
> I've watched top and I never see more than 8 cores going 100%.
> 
> Here is my (admittedly ridiculous) postgresql.conf:
> 
> checkpoint_completion_target = 0.9
> data_directory='/tank/postgresql/13/main'
> default_statistics_target = 100
> effective_cache_size = 381696MB
> effective_io_concurrency = 200
> enable_partition_pruning=on
> enable_partitionwise_aggregate=on
> enable_partitionwise_join=on
> listen_addresses='*'
> maintenance_work_mem = 2GB
> max_connections = 200
> max_parallel_maintenance_workers = 4
> max_parallel_workers = 512
> max_parallel_workers_per_gather = 512
> max_wal_size = 4GB
> max_worker_processes = 512
> min_wal_size = 1GB
> random_page_cost = 1.1
> shared_buffers = 127232MB
> shared_preload_libraries = 'cstore_fdw'
> synchronous_commit=off
> wal_buffers = 16MB
> work_mem = 1628560kB
> 
> Best,
> Seamus
> 
> 
> --
> Seamus Abshere, SCEA
> https://faraday.ai
> https://github.com/seamusabshere
> https://linkedin.com/in/seamusabshere