Re: Parallel hints in PostgreSQL with consistent perfromance

Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxxx> · Wed, 27 Dec 2023 15:15:11 +0100

On 12/27/23 14:15, mohini mane wrote:
> Hello Team,
> I observed that increasing the degree of parallel hint* *in the SELECT
> query did not show performance improvements.
> Below are the details of sample execution with EXPLAIN ANALYZE
> *PostgreSQL Version:* v15.5
> 
> *Operating System details:* RHL 7.x
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                16
> On-line CPU(s) list:   0-15
> Thread(s) per core:    1
> Core(s) per socket:    16
> Socket(s):             1
> NUMA node(s):          1
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 79
> Model name:            Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
> Stepping:              1
> CPU MHz:               2294.684
> BogoMIPS:              4589.36
> Hypervisor vendor:     Microsoft
> Virtualization type:   full
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              51200K
> NUMA node0 CPU(s):     0-15
> 
> *PostgreSQL Query:* 
> Sample sql executed through psql command prompt
> force_parallel_mode=on; max_parallel_workers_per_gather=200;
> max_parallel_workers=6
> 
> explain analyze select /*+ PARALLEL(A 6) */ ctid::varchar,
>  md5("col1"||'~'||"col7"||'~'||"col9"::varchar) ,
> md5("id"||'~'||"gender"||'~'||"firstname"||'~'||"lastname"||'~'||"address"||'~'||"city"||'~'||"salary"||'~'||"pincode"||'~'||"sales"||'~'||"phone"||'~'||"amount"||'~'||"dob"||'~'||"starttime"||'~'||"timezone"||'~'||"status"||'~'||"timenow"||'~'||"timelater"||'~'||"col2"||'~'||"col3"||'~'||"col4"||'~'||"col5"||'~'||"col6"||'~'||"col8"||'~'||"col10"||'~'||"col11"||'~'||"col12"||'~'||"col13"||'~'||"col14"||'~'||"col15"||'~'||"col16"||'~'||"col17"::varchar) ,  md5('@'||"col1"||'~'||"col7"||'~'||"col9"::varchar)  from "sp_qarun"."basic2" A  order by 2,4,3;
> 

Postgres doesn't support hints, so the /* ... */ part of the query is
just a comment and doesn't effect the parallelism at all. The main thing
influencing that are the GUC values you set before.

> *Output:*
> PSQL query execution with hints 6 for 1st time => 203505.402 ms
> PSQL query execution with hints 6 for 2nd time => 27920.272 ms
> PSQL query execution with hints 6 for 3rd time => 27666.770 ms
> Only 6 workers launched, and there is no reduction in execution time
> even after increasing the degree of parallel hints in select query.
> 

It's unclear if what exactly you changed, and what case you're comparing
the timing to. As I explained earlier, the hint comment has no effect.
So if that's what you increased, it's not surprising the timing does not
change.

Also, max_parallel_workers is the maximum total number of parallel
workers, i.e. it's upper bound of max_parallel_workers_per_gather. So if
you set it to 6, there will never be more than 6 workers, no matter what
value max_parallel_workers_per_gather is set to.

FWIW force_parallel_more is really meant for testing (in the context of
developing the database itself), it's hardly the thing you should do in
any other case, like for example testing performance.

> *Table Structure:*
> create table basic2(id int,gender char,firstname varchar(3000),
> lastname varchar(3000),address varchar(3000),city varchar(900),salary
> smallint,
> pincode bigint,sales numeric,phone real,amount double precision,
> dob date,starttime timestamp,timezone TIMESTAMP WITH TIME ZONE,
> status boolean,timenow time,timelater TIME WITH TIME ZONE,col1 int,
> col2 char,col3 varchar(3000),col4 varchar(3000),col5 varchar(3000),
> col6 varchar(900),col7 smallint,col8 bigint,col9 numeric,col10 real,
> col11 double precision,col12 date,col13 timestamp,col14 TIMESTAMP WITH
> TIME ZONE,
> col15 boolean,col16 time,col17 TIME WITH TIME ZONE,primary
> key(col1,col7,col9)); 
> 
> *Table Data:* 1000000 rows with each Row has a size of 20000.
> 

Without the data we can't actually try running the query.

In general it's a good idea to show the "explain analyze" output for the
cases you're comparing. Not only that shows what the database is doing,
it also shows timings for different parts of the query, how many workers
were planned / actually started etc.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company