Hi,
I'm stumbling on an issue which seems like this one: https://www.postgresql.org/message-id/20170719152038.19353.71475%40wrigleys.postgresql.org, but I hope someone can shed some light on my specific case.
Software:
POSTGIS="3.1.4 ded6c34" [EXTENSION] PGSQL="140" GEOS="3.8.0-CAPI-1.13.1 " PROJ="6.3.1" LIBXML="2.9.10" LIBJSON="0.13.1" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)"
PostgreSQL 14.0 (Ubuntu 14.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
PostgreSQL 14.0 (Ubuntu 14.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
2 relevant tables and indices:
CREATE TABLE osm_current.planet_osm_point
(
osm_id bigint NOT NULL,
tags hstore,
way geometry(Point,3857)
);
(
osm_id bigint NOT NULL,
tags hstore,
way geometry(Point,3857)
);
CREATE INDEX planet_osm_poi_bigfunc_geo
ON osm_current.planet_osm_point
USING gist
(bigfunc(tags), way)
WHERE bigfunc(tags) <= 14;
ON osm_current.planet_osm_point
USING gist
(bigfunc(tags), way)
WHERE bigfunc(tags) <= 14;
CREATE TABLE osm_current.planet_osm_polygon
(
osm_id bigint NOT NULL,
tags hstore,
way geometry(Polygon,3857)
);
(
osm_id bigint NOT NULL,
tags hstore,
way geometry(Polygon,3857)
);
CREATE INDEX planet_osm_polygon_bigfunc_geo
ON osm_current.planet_osm_polygon
USING gist
(bigfunc(tags), way)
WHERE bigfunc(tags) <= 14;
ON osm_current.planet_osm_polygon
USING gist
(bigfunc(tags), way)
WHERE bigfunc(tags) <= 14;
Query:
SELECT *
FROM osm_current.planet_osm_polygon
WHERE bigfunc(tags) <= 7
AND NOT EXISTS(
SELECT *
FROM osm_current.planet_osm_point
WHERE bigfunc(planet_osm_point.tags) <= 7
AND ST_Intersects(planet_osm_point.way,planet_osm_polygon.way)
AND bigfunc2(planet_osm_point.tags) = bigfunc2(planet_osm_polygon.tags)
)
AND ST_Intersects(
'SRID=3857;POLYGON((15012477.510296581 3741379.0533562037,15012477.510296581 4398859.837299369,15669958.252794353 4398859.837299369,15669958.252794353 3741379.0533562037,15012477.510296581 3741379.0533562037))'::geometry,
way)
FROM osm_current.planet_osm_polygon
WHERE bigfunc(tags) <= 7
AND NOT EXISTS(
SELECT *
FROM osm_current.planet_osm_point
WHERE bigfunc(planet_osm_point.tags) <= 7
AND ST_Intersects(planet_osm_point.way,planet_osm_polygon.way)
AND bigfunc2(planet_osm_point.tags) = bigfunc2(planet_osm_polygon.tags)
)
AND ST_Intersects(
'SRID=3857;POLYGON((15012477.510296581 3741379.0533562037,15012477.510296581 4398859.837299369,15669958.252794353 4398859.837299369,15669958.252794353 3741379.0533562037,15012477.510296581 3741379.0533562037))'::geometry,
way)
Normal execution: I canceled it after 1 hour...
Explain gives:
Gather (cost=22998304.12..81977433.81 rows=2628686 width=262)
Workers Planned: 2
-> Parallel Hash Anti Join (cost=22997304.12..81713565.21 rows=1095286 width=262)
Hash Cond: (bigfunc2(planet_osm_polygon.tags) = bigfunc2(planet_osm_point.tags))
Join Filter: st_intersects(planet_osm_point.way, planet_osm_polygon.way)
-> Parallel Bitmap Heap Scan on planet_osm_polygon (cost=51152.38..30790214.58 rows=1096787 width=262)
Recheck Cond: (bigfunc(tags) <= 7)
Filter: st_intersects('0103000020110F000001000000050000007F5954B04FA26C414760D486618B4C417F5954B04FA26C41185096F5C2C750412FE416C858E36D41185096F5C2C750412FE416C858E36D414760D486618B4C417F5954B04FA26C414760D486618B4C41'::geometry, way)
-> Bitmap Index Scan on planet_osm_polygon_bigfunc_geo (cost=0.00..50494.31 rows=2632289 width=0)
Index Cond: ((bigfunc(tags) <= 7) AND (way && '0103000020110F000001000000050000007F5954B04FA26C414760D486618B4C417F5954B04FA26C41185096F5C2C750412FE416C858E36D41185096F5C2C750412FE416C858E36D414760D486618B4C417F5954B04FA26C414760D486618B4C41'::geometry))
-> Parallel Hash (cost=22204690.21..22204690.21 rows=23875962 width=126)
-> Parallel Bitmap Heap Scan on planet_osm_point (cost=309564.90..22204690.21 rows=23875962 width=126)
Recheck Cond: (bigfunc(tags) <= 7)
-> Bitmap Index Scan on planet_osm_poi_bigfunc_geo (cost=0.00..295239.32 rows=57302310 width=0)
Index Cond: (bigfunc(tags) <= 7)
Workers Planned: 2
-> Parallel Hash Anti Join (cost=22997304.12..81713565.21 rows=1095286 width=262)
Hash Cond: (bigfunc2(planet_osm_polygon.tags) = bigfunc2(planet_osm_point.tags))
Join Filter: st_intersects(planet_osm_point.way, planet_osm_polygon.way)
-> Parallel Bitmap Heap Scan on planet_osm_polygon (cost=51152.38..30790214.58 rows=1096787 width=262)
Recheck Cond: (bigfunc(tags) <= 7)
Filter: st_intersects('0103000020110F000001000000050000007F5954B04FA26C414760D486618B4C417F5954B04FA26C41185096F5C2C750412FE416C858E36D41185096F5C2C750412FE416C858E36D414760D486618B4C417F5954B04FA26C414760D486618B4C41'::geometry, way)
-> Bitmap Index Scan on planet_osm_polygon_bigfunc_geo (cost=0.00..50494.31 rows=2632289 width=0)
Index Cond: ((bigfunc(tags) <= 7) AND (way && '0103000020110F000001000000050000007F5954B04FA26C414760D486618B4C417F5954B04FA26C41185096F5C2C750412FE416C858E36D41185096F5C2C750412FE416C858E36D414760D486618B4C417F5954B04FA26C414760D486618B4C41'::geometry))
-> Parallel Hash (cost=22204690.21..22204690.21 rows=23875962 width=126)
-> Parallel Bitmap Heap Scan on planet_osm_point (cost=309564.90..22204690.21 rows=23875962 width=126)
Recheck Cond: (bigfunc(tags) <= 7)
-> Bitmap Index Scan on planet_osm_poi_bigfunc_geo (cost=0.00..295239.32 rows=57302310 width=0)
Index Cond: (bigfunc(tags) <= 7)
When setting enable_hashjoin to false it gets radically different:
Gather (cost=52152.79..169588182414.71 rows=2628686 width=262) (actual time=11.162..1037.116 rows=5381 loops=1)
Output: planet_osm_polygon.osm_id, planet_osm_polygon.tags, planet_osm_polygon.way
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=380500 read=5531
-> Nested Loop Anti Join (cost=51152.79..169587918546.11 rows=1095286 width=262) (actual time=2.867..1015.295 rows=1794 loops=3)
Output: planet_osm_polygon.osm_id, planet_osm_polygon.tags, planet_osm_polygon.way
Buffers: shared hit=380500 read=5531
Worker 0: actual time=1.168..1011.822 rows=1834 loops=1
Buffers: shared hit=129515 read=1663
Worker 1: actual time=1.236..1010.438 rows=1858 loops=1
Buffers: shared hit=129837 read=1632
-> Parallel Bitmap Heap Scan on osm_current.planet_osm_polygon (cost=51152.38..30790214.58 rows=1096787 width=262) (actual time=1.846..23.809 rows=1853 loops=3)
Output: planet_osm_polygon.osm_id, planet_osm_polygon.tags, planet_osm_polygon.way
Recheck Cond: (bigfunc(planet_osm_polygon.tags) <= 7)
Filter: st_intersects('0103000020110F000001000000050000007F5954B04FA26C414760D486618B4C417F5954B04FA26C41185096F5C2C750412FE416C858E36D41185096F5C2C750412FE416C858E36D414760D486618B4C417F5954B04FA26C414760D486618B4C41'::geometry, planet_osm_polygon.way)
Heap Blocks: exact=1235
Buffers: shared hit=9 read=4104
Worker 0: actual time=0.135..22.343 rows=1902 loops=1
Buffers: shared hit=2 read=1317
Worker 1: actual time=0.174..21.743 rows=1904 loops=1
Buffers: shared hit=3 read=1262
-> Bitmap Index Scan on planet_osm_polygon_bigfunc_geo (cost=0.00..50494.31 rows=2632289 width=0) (actual time=4.552..4.564 rows=5560 loops=1)
Index Cond: ((bigfunc(planet_osm_polygon.tags) <= 7) AND (planet_osm_polygon.way && '0103000020110F000001000000050000007F5954B04FA26C414760D486618B4C417F5954B04FA26C41185096F5C2C750412FE416C858E36D41185096F5C2C750412FE416C858E36D414760D486618B4C417F5954B04FA26C414760D486618B4C41'::geometry))
Buffers: shared read=294
-> Index Scan using planet_osm_poi_bigfunc_geo on osm_current.planet_osm_point (cost=0.42..154805.97 rows=29 width=126) (actual time=0.534..0.534 rows=0 loops=5560)
Output: planet_osm_point.osm_id, planet_osm_point.tags, planet_osm_point.way
Index Cond: ((bigfunc(planet_osm_point.tags) <= 7) AND (planet_osm_point.way && planet_osm_polygon.way))
Filter: ((bigfunc2(planet_osm_point.tags) = bigfunc2(planet_osm_polygon.tags)) AND st_intersects(planet_osm_point.way, planet_osm_polygon.way))
Rows Removed by Filter: 0
Buffers: shared hit=380491 read=1427
Worker 0: actual time=0.519..0.519 rows=0 loops=1902
Buffers: shared hit=129513 read=346
Worker 1: actual time=0.518..0.518 rows=0 loops=1904
Buffers: shared hit=129834 read=370
Planning Time: 8.837 ms
Execution Time: 1037.867 ms
Output: planet_osm_polygon.osm_id, planet_osm_polygon.tags, planet_osm_polygon.way
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=380500 read=5531
-> Nested Loop Anti Join (cost=51152.79..169587918546.11 rows=1095286 width=262) (actual time=2.867..1015.295 rows=1794 loops=3)
Output: planet_osm_polygon.osm_id, planet_osm_polygon.tags, planet_osm_polygon.way
Buffers: shared hit=380500 read=5531
Worker 0: actual time=1.168..1011.822 rows=1834 loops=1
Buffers: shared hit=129515 read=1663
Worker 1: actual time=1.236..1010.438 rows=1858 loops=1
Buffers: shared hit=129837 read=1632
-> Parallel Bitmap Heap Scan on osm_current.planet_osm_polygon (cost=51152.38..30790214.58 rows=1096787 width=262) (actual time=1.846..23.809 rows=1853 loops=3)
Output: planet_osm_polygon.osm_id, planet_osm_polygon.tags, planet_osm_polygon.way
Recheck Cond: (bigfunc(planet_osm_polygon.tags) <= 7)
Filter: st_intersects('0103000020110F000001000000050000007F5954B04FA26C414760D486618B4C417F5954B04FA26C41185096F5C2C750412FE416C858E36D41185096F5C2C750412FE416C858E36D414760D486618B4C417F5954B04FA26C414760D486618B4C41'::geometry, planet_osm_polygon.way)
Heap Blocks: exact=1235
Buffers: shared hit=9 read=4104
Worker 0: actual time=0.135..22.343 rows=1902 loops=1
Buffers: shared hit=2 read=1317
Worker 1: actual time=0.174..21.743 rows=1904 loops=1
Buffers: shared hit=3 read=1262
-> Bitmap Index Scan on planet_osm_polygon_bigfunc_geo (cost=0.00..50494.31 rows=2632289 width=0) (actual time=4.552..4.564 rows=5560 loops=1)
Index Cond: ((bigfunc(planet_osm_polygon.tags) <= 7) AND (planet_osm_polygon.way && '0103000020110F000001000000050000007F5954B04FA26C414760D486618B4C417F5954B04FA26C41185096F5C2C750412FE416C858E36D41185096F5C2C750412FE416C858E36D414760D486618B4C417F5954B04FA26C414760D486618B4C41'::geometry))
Buffers: shared read=294
-> Index Scan using planet_osm_poi_bigfunc_geo on osm_current.planet_osm_point (cost=0.42..154805.97 rows=29 width=126) (actual time=0.534..0.534 rows=0 loops=5560)
Output: planet_osm_point.osm_id, planet_osm_point.tags, planet_osm_point.way
Index Cond: ((bigfunc(planet_osm_point.tags) <= 7) AND (planet_osm_point.way && planet_osm_polygon.way))
Filter: ((bigfunc2(planet_osm_point.tags) = bigfunc2(planet_osm_polygon.tags)) AND st_intersects(planet_osm_point.way, planet_osm_polygon.way))
Rows Removed by Filter: 0
Buffers: shared hit=380491 read=1427
Worker 0: actual time=0.519..0.519 rows=0 loops=1902
Buffers: shared hit=129513 read=346
Worker 1: actual time=0.518..0.518 rows=0 loops=1904
Buffers: shared hit=129834 read=370
Planning Time: 8.837 ms
Execution Time: 1037.867 ms
so finishes in a second.
where bigfunc(tags) is a huge case when list wich returns an integer between 1 and 20
and bigfunc2(tags) is a huge case which returns a text[] with mostly 1 item in it
The primary cause seems to be the selectivity, but table+index is vacuumed and analyzed...
This query is used in a cursor so it's not easy for me to disable the hashjoin in our workflow, the only thing I can do to fix it is to keep adding
AND ST_Intersects(
'SRID=3857;POLYGON((15012477.510296581 3741379.0533562037,15012477.510296581 4398859.837299369,15669958.252794353 4398859.837299369,15669958.252794353 3741379.0533562037,15012477.510296581 3741379.0533562037))'::geometry,
way)
'SRID=3857;POLYGON((15012477.510296581 3741379.0533562037,15012477.510296581 4398859.837299369,15669958.252794353 4398859.837299369,15669958.252794353 3741379.0533562037,15012477.510296581 3741379.0533562037))'::geometry,
way)
until the queryplan is reasonable (in the most extreme case I needed 5 of them).
This is ofcourse not the way to go (also disabling things in production is not what I want) so any pointers on how to let postgres use a better estimation on the number of rows is welcome!
Paul
P.S. when replying, please include me too