2018-04-17 12:52 GMT+02:00 Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx>:
On 04/17/2018 07:17 AM, Pavel Stehule wrote:
Hi>
2018-04-16 22:42 GMT+02:00 Hackety Man <hacketyman@xxxxxxxxx <mailto:hacketyman@xxxxxxxxx>>:
...
A support of parallel query execution is not complete - it doesn't work in PostgreSQL 11 too. So although EXISTS variant can be faster (but can be - the worst case of EXISTS is same like COUNT), then due disabled parallel execution the COUNT(*) is faster now. It is unfortunate, because I believe so this issue will be fixed in few years.
None of the issues seems to be particularly related to parallel query. It's much more likely a general issue with planning EXISTS / LIMIT and non-uniform data distribution.
I was wrong EXISTS are not supported. It looks like new dimension of performance issues related to parallelism. I understand so this example is worst case.
postgres=# EXPLAIN (ANALYZE, BUFFERS) select exists(SELECT * FROM zz_noidx1 WHERE LOWER(text_distinct) = LOWER('Test5000001'));
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Result (cost=4.08..4.09 rows=1 width=1) (actual time=423.600..423.600 rows=1 loops=1)
Buffers: shared hit=3296 read=2110
InitPlan 1 (returns $0)
-> Seq Scan on zz_noidx1 (cost=0.00..20406.00 rows=5000 width=0) (actual time=423.595..423.595 rows=0 loops=1)
Filter: (lower(text_distinct) = 'test5000001'::text)
Rows Removed by Filter: 1000000
Buffers: shared hit=3296 read=2110
Planning Time: 0.133 ms
Execution Time: 423.633 ms
postgres=# EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(text_distinct) = LOWER('Test5000001');
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=12661.42..12661.43 rows=1 width=8) (actual time=246.662..246.662 rows=1 loops=1)
Buffers: shared hit=817 read=549
-> Gather (cost=12661.21..12661.42 rows=2 width=8) (actual time=246.642..246.656 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=817 read=549
-> Partial Aggregate (cost=11661.21..11661.22 rows=1 width=8) (actual time=242.168..242.169 rows=1 loops=3)
Buffers: shared hit=3360 read=2046
-> Parallel Seq Scan on zz_noidx1 (cost=0.00..11656.00 rows=2083 width=0) (actual time=242.165..242.165 rows=0 loops=3)
Filter: (lower(text_distinct) = 'test5000001'::text)
Rows Removed by Filter: 333333
Buffers: shared hit=3360 read=2046
Planning Time: 0.222 ms
Execution Time: 247.927 ms
postgres=# EXPLAIN (ANALYZE, BUFFERS) select exists(SELECT * FROM zz_noidx1 WHERE LOWER(text_distinct) = LOWER('Test5000001'));
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Result (cost=4.08..4.09 rows=1 width=1) (actual time=423.600..423.600 rows=1 loops=1)
Buffers: shared hit=3296 read=2110
InitPlan 1 (returns $0)
-> Seq Scan on zz_noidx1 (cost=0.00..20406.00 rows=5000 width=0) (actual time=423.595..423.595 rows=0 loops=1)
Filter: (lower(text_distinct) = 'test5000001'::text)
Rows Removed by Filter: 1000000
Buffers: shared hit=3296 read=2110
Planning Time: 0.133 ms
Execution Time: 423.633 ms
postgres=# EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(text_distinct) = LOWER('Test5000001');
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=12661.42..12661.43 rows=1 width=8) (actual time=246.662..246.662 rows=1 loops=1)
Buffers: shared hit=817 read=549
-> Gather (cost=12661.21..12661.42 rows=2 width=8) (actual time=246.642..246.656 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=817 read=549
-> Partial Aggregate (cost=11661.21..11661.22 rows=1 width=8) (actual time=242.168..242.169 rows=1 loops=3)
Buffers: shared hit=3360 read=2046
-> Parallel Seq Scan on zz_noidx1 (cost=0.00..11656.00 rows=2083 width=0) (actual time=242.165..242.165 rows=0 loops=3)
Filter: (lower(text_distinct) = 'test5000001'::text)
Rows Removed by Filter: 333333
Buffers: shared hit=3360 read=2046
Planning Time: 0.222 ms
Execution Time: 247.927 ms
The cost of EXISTS is too low to use parallelism, and value is found too late.
When I decrease startup cost to 0 of parallel exec I got similar plan, similar time
postgres=# EXPLAIN (ANALYZE, BUFFERS) select exists(SELECT * FROM zz_noidx1 WHERE LOWER(text_distinct) = LOWER('Test5000001'));
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Result (cost=2.43..2.44 rows=1 width=1) (actual time=246.398..246.402 rows=1 loops=1)
Buffers: shared hit=885 read=489
InitPlan 1 (returns $1)
-> Gather (cost=0.00..12156.00 rows=5000 width=0) (actual time=246.393..246.393 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=885 read=489
-> Parallel Seq Scan on zz_noidx1 (cost=0.00..11656.00 rows=2083 width=0) (actual time=241.067..241.067 rows=0 loops=3)
Filter: (lower(text_distinct) = 'test5000001'::text)
Rows Removed by Filter: 333333
Buffers: shared hit=3552 read=1854
Planning Time: 0.138 ms
Execution Time: 247.623 ms
(13 rows)
postgres=# EXPLAIN (ANALYZE, BUFFERS) select exists(SELECT * FROM zz_noidx1 WHERE LOWER(text_distinct) = LOWER('Test5000001'));
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Result (cost=2.43..2.44 rows=1 width=1) (actual time=246.398..246.402 rows=1 loops=1)
Buffers: shared hit=885 read=489
InitPlan 1 (returns $1)
-> Gather (cost=0.00..12156.00 rows=5000 width=0) (actual time=246.393..246.393 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=885 read=489
-> Parallel Seq Scan on zz_noidx1 (cost=0.00..11656.00 rows=2083 width=0) (actual time=241.067..241.067 rows=0 loops=3)
Filter: (lower(text_distinct) = 'test5000001'::text)
Rows Removed by Filter: 333333
Buffers: shared hit=3552 read=1854
Planning Time: 0.138 ms
Execution Time: 247.623 ms
(13 rows)
From this perspective it looks so cost of EXISTS(subselect) is maybe too low.
Regards
Pavel
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services