Hi Jeff,
the way I see it, is it a poor man's implementation of 'full-text' search. I just discussed it with out navdata team and we might be redefine how do we do the search. Regardless of that, I think that issue with Postgres stands.
I tried now to see, how the query would behave if we always had left-anchored pattern and that would allow us to stick to btree indexes.
* Query
------------------------------------
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
SELECT
r.*
FROM navdata.route r
INNER JOIN navdata.point op ON r.frompointguid = op.guid
INNER JOIN navdata.point dp ON r.topointguid = dp.guid
WHERE
r.routeidentifier LIKE '%'
AND tsrange(r.startvalid, r.endvalid) @> now() :: TIMESTAMP
AND (NOT false :: BOOLEAN OR r.source = ANY (ARRAY[] :: VARCHAR []))
AND op.identifier LIKE 'LOW%'
AND op.type = ANY (ARRAY['PA'] :: VARCHAR [])
AND tsrange(op.startvalid, op.endvalid) @> now() :: TIMESTAMP
AND dp.identifier LIKE '%' :: VARCHAR
AND dp.type = ANY (ARRAY['PA'] :: VARCHAR [])
AND tsrange(dp.startvalid, dp.endvalid) @> now() :: TIMESTAMP
ORDER BY r.routeidentifier
LIMIT 1000;
SELECT
r.*
FROM navdata.route r
INNER JOIN navdata.point op ON r.frompointguid = op.guid
INNER JOIN navdata.point dp ON r.topointguid = dp.guid
WHERE
r.routeidentifier LIKE '%'
AND tsrange(r.startvalid, r.endvalid) @> now() :: TIMESTAMP
AND (NOT false :: BOOLEAN OR r.source = ANY (ARRAY[] :: VARCHAR []))
AND op.identifier LIKE 'LOW%'
AND op.type = ANY (ARRAY['PA'] :: VARCHAR [])
AND tsrange(op.startvalid, op.endvalid) @> now() :: TIMESTAMP
AND dp.identifier LIKE '%' :: VARCHAR
AND dp.type = ANY (ARRAY['PA'] :: VARCHAR [])
AND tsrange(dp.startvalid, dp.endvalid) @> now() :: TIMESTAMP
ORDER BY r.routeidentifier
LIMIT 1000;
* Analysis
---------------------------------------
Limit (cost=646.48..646.48 rows=1 width=349) (actual time=1375.359..1376.447 rows=1000 loops=1)
Output: r.uid, r.routeidentifier, r.frompointguid, r.topointguid, r.sidguid, r.starguid, r.routeinformation, r.routetype, r.startvalid, r.endvalid, r.revisionuid, r.source, r.fufi, r.grounddistance_excl_sidstar, r.from_first, r.dep_airports, r.dst_airports, r.tag, r.expanded_route_string, r.route_geometry
Buffers: shared hit=79276
-> Sort (cost=646.48..646.48 rows=1 width=349) (actual time=1375.356..1375.785 rows=1000 loops=1)
Output: r.uid, r.routeidentifier, r.frompointguid, r.topointguid, r.sidguid, r.starguid, r.routeinformation, r.routetype, r.startvalid, r.endvalid, r.revisionuid, r.source, r.fufi, r.grounddistance_excl_sidstar, r.from_first, r.dep_airports, r.dst_airports, r.tag, r.expanded_route_string, r.route_geometry
Sort Key: r.routeidentifier
Sort Method: top-N heapsort Memory: 321kB
Buffers: shared hit=79276
-> Nested Loop (cost=250.30..646.47 rows=1 width=349) (actual time=202.826..1372.178 rows=2596 loops=1)
Output: r.uid, r.routeidentifier, r.frompointguid, r.topointguid, r.sidguid, r.starguid, r.routeinformation, r.routetype, r.startvalid, r.endvalid, r.revisionuid, r.source, r.fufi, r.grounddistance_excl_sidstar, r.from_first, r.dep_airports, r.dst_airports, r.tag, r.expanded_route_string, r.route_geometry
Buffers: shared hit=79276
-> Nested Loop (cost=249.87..621.96 rows=1 width=349) (actual time=202.781..1301.135 rows=2602 loops=1)
Output: r.uid, r.routeidentifier, r.frompointguid, r.topointguid, r.sidguid, r.starguid, r.routeinformation, r.routetype, r.startvalid, r.endvalid, r.revisionuid, r.source, r.fufi, r.grounddistance_excl_sidstar, r.from_first, r.dep_airports, r.dst_airports, r.tag, r.expanded_route_string, r.route_geometry
Buffers: shared hit=51974
-> Index Scan using idx_point_11 on navdata.point op (cost=0.43..107.02 rows=2 width=16) (actual time=0.055..0.214 rows=7 loops=1)
Output: op.uid, op.guid, op.airportguid, op.identifier, op.icaocode, op.name, op.type, op.coordinates, op.fir, op.navaidfrequency, op.elevation, op.magneticvariance, op.startvalid, op.endvalid, op.revisionuid, op.source, op.leveltype
Index Cond: (((op.type)::text = ANY ('{PA}'::text[])) AND ((op.identifier)::text ~>=~ 'LOW'::text) AND ((op.identifier)::text ~<~ 'LOX'::text))
Filter: (((op.identifier)::text ~~ 'LOW%'::text) AND (tsrange(op.startvalid, op.endvalid) @> (now())::timestamp without time zone))
Rows Removed by Filter: 42
Buffers: shared hit=52
-> Bitmap Heap Scan on navdata.route r (cost=249.44..257.45 rows=2 width=349) (actual time=183.255..185.491 rows=372 loops=7)
Output: r.uid, r.routeidentifier, r.frompointguid, r.topointguid, r.sidguid, r.starguid, r.routeinformation, r.routetype, r.startvalid, r.endvalid, r.revisionuid, r.source, r.fufi, r.grounddistance_excl_sidstar, r.from_first, r.dep_airports, r.dst_airports, r.tag, r.expanded_route_string, r.route_geometry
Recheck Cond: ((r.frompointguid = op.guid) AND (tsrange(r.startvalid, r.endvalid) @> (now())::timestamp without time zone))
Filter: ((r.routeidentifier)::text ~~ '%'::text)
Heap Blocks: exact=2140
Buffers: shared hit=51922
-> BitmapAnd (cost=249.44..249.44 rows=2 width=0) (actual time=183.197..183.197 rows=0 loops=7)
Buffers: shared hit=49782
-> Bitmap Index Scan on idx_route_02 (cost=0.00..10.96 rows=338 width=0) (actual time=0.162..0.162 rows=884 loops=7)
Index Cond: (r.frompointguid = op.guid)
Buffers: shared hit=47
-> Bitmap Index Scan on idx_route_07 (cost=0.00..237.01 rows=4896 width=0) (actual time=182.858..182.858 rows=579062 loops=7)
Index Cond: (tsrange(r.startvalid, r.endvalid) @> (now())::timestamp without time zone)
Buffers: shared hit=49735
-> Index Scan using cidx_point on navdata.point dp (cost=0.43..24.50 rows=1 width=16) (actual time=0.008..0.025 rows=1 loops=2602)
Output: dp.uid, dp.guid, dp.airportguid, dp.identifier, dp.icaocode, dp.name, dp.type, dp.coordinates, dp.fir, dp.navaidfrequency, dp.elevation, dp.magneticvariance, dp.startvalid, dp.endvalid, dp.revisionuid, dp.source, dp.leveltype
Index Cond: (dp.guid = r.topointguid)
Filter: (((dp.type)::text = ANY ('{PA}'::text[])) AND ((dp.identifier)::text ~~ '%'::text) AND (tsrange(dp.startvalid, dp.endvalid) @> (now())::timestamp without time zone))
Rows Removed by Filter: 6
Buffers: shared hit=27302
Planning time: 12.202 ms
Execution time: 1376.912 ms
Output: r.uid, r.routeidentifier, r.frompointguid, r.topointguid, r.sidguid, r.starguid, r.routeinformation, r.routetype, r.startvalid, r.endvalid, r.revisionuid, r.source, r.fufi, r.grounddistance_excl_sidstar, r.from_first, r.dep_airports, r.dst_airports, r.tag, r.expanded_route_string, r.route_geometry
Buffers: shared hit=79276
-> Sort (cost=646.48..646.48 rows=1 width=349) (actual time=1375.356..1375.785 rows=1000 loops=1)
Output: r.uid, r.routeidentifier, r.frompointguid, r.topointguid, r.sidguid, r.starguid, r.routeinformation, r.routetype, r.startvalid, r.endvalid, r.revisionuid, r.source, r.fufi, r.grounddistance_excl_sidstar, r.from_first, r.dep_airports, r.dst_airports, r.tag, r.expanded_route_string, r.route_geometry
Sort Key: r.routeidentifier
Sort Method: top-N heapsort Memory: 321kB
Buffers: shared hit=79276
-> Nested Loop (cost=250.30..646.47 rows=1 width=349) (actual time=202.826..1372.178 rows=2596 loops=1)
Output: r.uid, r.routeidentifier, r.frompointguid, r.topointguid, r.sidguid, r.starguid, r.routeinformation, r.routetype, r.startvalid, r.endvalid, r.revisionuid, r.source, r.fufi, r.grounddistance_excl_sidstar, r.from_first, r.dep_airports, r.dst_airports, r.tag, r.expanded_route_string, r.route_geometry
Buffers: shared hit=79276
-> Nested Loop (cost=249.87..621.96 rows=1 width=349) (actual time=202.781..1301.135 rows=2602 loops=1)
Output: r.uid, r.routeidentifier, r.frompointguid, r.topointguid, r.sidguid, r.starguid, r.routeinformation, r.routetype, r.startvalid, r.endvalid, r.revisionuid, r.source, r.fufi, r.grounddistance_excl_sidstar, r.from_first, r.dep_airports, r.dst_airports, r.tag, r.expanded_route_string, r.route_geometry
Buffers: shared hit=51974
-> Index Scan using idx_point_11 on navdata.point op (cost=0.43..107.02 rows=2 width=16) (actual time=0.055..0.214 rows=7 loops=1)
Output: op.uid, op.guid, op.airportguid, op.identifier, op.icaocode, op.name, op.type, op.coordinates, op.fir, op.navaidfrequency, op.elevation, op.magneticvariance, op.startvalid, op.endvalid, op.revisionuid, op.source, op.leveltype
Index Cond: (((op.type)::text = ANY ('{PA}'::text[])) AND ((op.identifier)::text ~>=~ 'LOW'::text) AND ((op.identifier)::text ~<~ 'LOX'::text))
Filter: (((op.identifier)::text ~~ 'LOW%'::text) AND (tsrange(op.startvalid, op.endvalid) @> (now())::timestamp without time zone))
Rows Removed by Filter: 42
Buffers: shared hit=52
-> Bitmap Heap Scan on navdata.route r (cost=249.44..257.45 rows=2 width=349) (actual time=183.255..185.491 rows=372 loops=7)
Output: r.uid, r.routeidentifier, r.frompointguid, r.topointguid, r.sidguid, r.starguid, r.routeinformation, r.routetype, r.startvalid, r.endvalid, r.revisionuid, r.source, r.fufi, r.grounddistance_excl_sidstar, r.from_first, r.dep_airports, r.dst_airports, r.tag, r.expanded_route_string, r.route_geometry
Recheck Cond: ((r.frompointguid = op.guid) AND (tsrange(r.startvalid, r.endvalid) @> (now())::timestamp without time zone))
Filter: ((r.routeidentifier)::text ~~ '%'::text)
Heap Blocks: exact=2140
Buffers: shared hit=51922
-> BitmapAnd (cost=249.44..249.44 rows=2 width=0) (actual time=183.197..183.197 rows=0 loops=7)
Buffers: shared hit=49782
-> Bitmap Index Scan on idx_route_02 (cost=0.00..10.96 rows=338 width=0) (actual time=0.162..0.162 rows=884 loops=7)
Index Cond: (r.frompointguid = op.guid)
Buffers: shared hit=47
-> Bitmap Index Scan on idx_route_07 (cost=0.00..237.01 rows=4896 width=0) (actual time=182.858..182.858 rows=579062 loops=7)
Index Cond: (tsrange(r.startvalid, r.endvalid) @> (now())::timestamp without time zone)
Buffers: shared hit=49735
-> Index Scan using cidx_point on navdata.point dp (cost=0.43..24.50 rows=1 width=16) (actual time=0.008..0.025 rows=1 loops=2602)
Output: dp.uid, dp.guid, dp.airportguid, dp.identifier, dp.icaocode, dp.name, dp.type, dp.coordinates, dp.fir, dp.navaidfrequency, dp.elevation, dp.magneticvariance, dp.startvalid, dp.endvalid, dp.revisionuid, dp.source, dp.leveltype
Index Cond: (dp.guid = r.topointguid)
Filter: (((dp.type)::text = ANY ('{PA}'::text[])) AND ((dp.identifier)::text ~~ '%'::text) AND (tsrange(dp.startvalid, dp.endvalid) @> (now())::timestamp without time zone))
Rows Removed by Filter: 6
Buffers: shared hit=27302
Planning time: 12.202 ms
Execution time: 1376.912 ms
Why think is weird here is this:
-> BitmapAnd (cost=249.44..249.44 rows=2 width=0) (actual time=183.197..183.197 rows=0 loops=7)
Buffers: shared hit=49782
-> Bitmap Index Scan on idx_route_02 (cost=0.00..10.96 rows=338 width=0) (actual time=0.162..0.162 rows=884 loops=7)
Index Cond: (r.frompointguid = op.guid)
Buffers: shared hit=47
-> Bitmap Index Scan on idx_route_07 (cost=0.00..237.01 rows=4896 width=0) (actual time=182.858..182.858 rows=579062 loops=7)
Index Cond: (tsrange(r.startvalid, r.endvalid) @> (now())::timestamp without time zone)
Buffers: shared hit=49735
Buffers: shared hit=49782
-> Bitmap Index Scan on idx_route_02 (cost=0.00..10.96 rows=338 width=0) (actual time=0.162..0.162 rows=884 loops=7)
Index Cond: (r.frompointguid = op.guid)
Buffers: shared hit=47
-> Bitmap Index Scan on idx_route_07 (cost=0.00..237.01 rows=4896 width=0) (actual time=182.858..182.858 rows=579062 loops=7)
Index Cond: (tsrange(r.startvalid, r.endvalid) @> (now())::timestamp without time zone)
Buffers: shared hit=49735
Why would postgres choose to use second index idx_route_07 at all when row estimate is way higher then on idx_route_02? Wouldn't it be better just to use one index with lower number of estimated rows and then filter?
Thanks
Sasa
On 20 June 2018 at 16:53, Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
On Wed, Jun 20, 2018 at 9:21 AM, Sasa Vilic <sasavilic@xxxxxxxxx> wrote:Query that we have finds all routes between two set of points. A set is adynamically/loosely defined by pattern given by the user input. So for exampleif user wants to find all routes between international airports in Austriatoward London Heathrow, he or she would use 'LOW%' as :from_point_identifierand 'EGLL' as :to_point_identifier. Please keep in mind that is a simple case,and that user is allowed to define search term any way he/she see it fit,
i.e. '%OW%', 'EG%'.Letting users do substring searches on airport codes in the middle of a complex query makes no sense. Do all airports with 'OW' in the middle of them having something in common with each other? If people can't remember the real airport code of the airport they are using, you should offer a look-up tool which they can use to figure that out **before** hitting the main query.But taking for granted your weird use case, the most obvious improvement to the PostgreSQL code that I can see is in the executor, not the planner. There is no reason to recompute the bitmap on idx_point_08 each time through the nested loop, as the outcome of that scan doesn't depend on the outer tuple. Presumably the reason this happens is that it is being 'BitmapAnd'ed with another bitmap index scan which does depend on the outer tuple, and it is just not smart enough to reuse the stable bitmap while recomputing the parameterized one.Cheers,Jeff