Fwd: different execution time for the same query (and same DB status)

Francesco De Angelis <franc.tozzus@xxxxxxxxx> · Sat, 6 Mar 2021 22:40:00 +0100

Hi all,
I would appreciate if somebody could help me understanding why the query described below takes very different execution times to complete, almost completely randomly.

I have two very "simple" tables, A and B:

CREATE TABLE A (
	a1 varchar(10) NULL,
	a2 varchar(10) NULL,
	v int4 NULL
);
CREATE TABLE B (
	a1 varchar(10) NULL,
	a2 varchar(10) NULL,
	v int4 NULL
);

I load A and B with some random numbers (10^7 records each one) and, when the loading is complete, I run a query that "essentially" counts the number of the records obtained through a sequence of JOINs, which has the following form:
    SELECT count(*) from (... A join B ....)

The query does not write/update any value, it is simply a SELECT applied over the aforementioned tables (plus some unions and intersections).

The problem is the following: the query can take between 20 seconds and 4 minutes to complete. Most of times, when I run the query for the first time after the server initialisation, it takes 20 seconds; but if I re-run it again (without changing anything) right after the first execution, the probability to take more than 4 minutes is very high.
My impression is that the "status" of the internal structures of the DBMS is somehow affected by the first execution, but I cannot figure out neither what nor how I can fix it.

To give some additional information, I can state the following facts:
- There are no other processes reading or writing the tables on the schema and the status of A and B is constant.
- During the loading of A and B (process that takes more or less one minute to complete),  I get the following messages:
    LOG:  checkpoints are occurring too frequently (29 seconds apart).
    HINT:  Consider increasing the configuration parameter "max_wal_size".
    LOG:  checkpoints are occurring too frequently (18 seconds apart)
    HINT:  Consider increasing the configuration parameter "max_wal_size".
- I am running my query through DBeaver and PostgreSQL runs in a Docker container.

The version that I am using is the following: 13.2 (Debian 13.2-1.pgdg100+1).
The only configuration parameters I have changed are the following:
- force_parallel_mode = on
- max_parallel_workers_per_gather = 64
- parallel_setup_cost = 1
- parallel_tuple_cost = 0.001
- work_mem = '800MB'

I hope somebody could help me, as I really don't know why I am experiencing such a strange behaviour. 

Thanks a lot.

Best regards,
Francesco