Varying the segment length upwards might have a salutary effect for a while, as the efficiency improvement of fewer inner loops battles with the inefficiency of having more points selected by the index filter. Worth an experiment.
P
On Thu, Jan 5, 2017 at 1:00 PM, Israel Brewster <israel@xxxxxxxxxxxxxx> wrote:
On Jan 5, 2017, at 10:38 AM, Paul Ramsey <pramsey@xxxxxxxxxxxxxxxxx> wrote:Yes, you did. You want a query that spits out a tupleset of goemetries (one each for each wee segment), and then you can join that set to your main table using st_dwithin() as the join clause.So start by ditching the main table and just work on a query that generates a pile of wee segments.Ahhh, I see you've done this sort of thing before (http://blog.cleverelephant.ca/2015/02/breaking- ) :-)linestring-into-segments.html So following that advice I came up with the following query:WITH dump AS (SELECTST_DumpPoints(ST_Segmentize(ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056 61.179167,-156.77 71.285833)'), 600)::geometry) as pt),pts AS (SELECT (pt).geom, (pt).path[1] as vert FROM dump)SELECT elevationFROM dataINNER JOIN (SELECTST_MakeLine(ARRAY[a.geom, b.geom]) as short_lineFROM pts aINNER JOIN pts bON a.vert=b.vert-1 AND b.vert>1) segmentsON ST_DWithin(location, segments.short_line, 600)ORDER BY elevation DESC limit 1;Which yields the following EXPLAIN ANALYZE (https://explain.depesz.com/s/RsTD ):QUERY PLAN
------------------------------------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ --------------------------
Limit (cost=11611706.90..11611706.91 rows=1 width=4) (actual time=1171.814..1171.814 rows=1 loops=1)
CTE dump
-> Result (cost=0.00..5.25 rows=1000 width=32) (actual time=0.024..1.989 rows=1939 loops=1)
CTE pts
-> CTE Scan on dump (cost=0.00..20.00 rows=1000 width=36) (actual time=0.032..4.071 rows=1939 loops=1)
-> Sort (cost=11611681.65..11611768.65 rows=34800 width=4) (actual time=1171.813..1171.813 rows=1 loops=1)
Sort Key: data.elevation DESC
Sort Method: top-N heapsort Memory: 25kB
-> Nested Loop (cost=0.55..11611507.65 rows=34800 width=4) (actual time=0.590..1167.615 rows=28408 loops=1)
-> Nested Loop (cost=0.00..8357.50 rows=1665 width=64) (actual time=0.046..663.475 rows=1938 loops=1)
Join Filter: (a.vert = (b.vert - 1))
Rows Removed by Join Filter: 3755844
-> CTE Scan on pts b (cost=0.00..22.50 rows=333 width=36) (actual time=0.042..0.433 rows=1938 loops=1)
Filter: (vert > 1)
Rows Removed by Filter: 1
-> CTE Scan on pts a (cost=0.00..20.00 rows=1000 width=36) (actual time=0.000..0.149 rows=1939 loops=1938)
-> Index Scan using location_gix on data (cost=0.55..6968.85 rows=1 width=36) (actual time=0.085..0.256 rows=15 loops=1938)
Index Cond: (location && _st_expand((st_makeline(ARRAY[a.geom, b.geom]))::geography, '600'::double precision))
Filter: (((st_makeline(ARRAY[a.geom, b.geom]))::geography && _st_expand(location, '600'::double precision)) AND _st_dwithin(location, (st_makeline(ARRAY[a.geom, b.geom]))::geography, '600'::double precision, true))
Rows Removed by Filter: 7
Planning time: 4.318 ms
Execution time: 1171.994 ms
(22 rows)So not bad. Went from 20+ seconds to a little over 1 second. Still noticeable for a end user, but defiantly usable - and like mentioned, that's a worst-case scenario query. Thanks!Of course, if you have any suggestions for further improvement, I'm all ears :-)----------------------------------------------- Israel BrewsterSystems Analyst IIRavn Alaska5245 Airport Industrial RdFairbanks, AK 99709----------------------------------------------- On Thu, Jan 5, 2017 at 11:36 AM, Israel Brewster <israel@xxxxxxxxxxxxxx> wrote:On Jan 5, 2017, at 8:50 AM, Paul Ramsey <pramsey@xxxxxxxxxxxxxxxxx> wrote:The index filters using bounding boxes. A long, diagonal route will have a large bounding box, relative to the area you actually care about (within a narrow strip of the route). Use ST_Segmentize() to add points to your route, ST_DumpPoints() to dump those out as point and ST_MakeLine to generate new lines from those points, each line very short. The maximum index effectiveness will come when your line length is close to your buffer width.
POk, I think I understand the concept. So attempting to follow your advice, I modified the query to be:SELECT elevationFROM dataWHEREST_DWithin(location,(SELECT ST_MakeLine(geom)::geography as split_lineFROM (SELECT(ST_DumpPoints(ST_Segmentize(ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056 61.179167,-156.77 71.285833)'), 600)::geometry)).geom) s1),600)ORDER BY elevation DESC limit 1;It took some fiddling to find a syntax that Postgresql would accept, but eventually that's what I came up with. Unfortunately, far from improving performance, it killed it - in running the query, it went from 22 seconds to several minutes (EXPLAIn ANALYZE has yet to return a result). Looking at the query execution plan shows, at least partially, why:QUERY PLAN
------------------------------------------------------------ ------------------
Limit (cost=17119748.98..17119748.98 rows=1 width=4)
InitPlan 1 (returns $0)
-> Aggregate (cost=17.76..17.77 rows=1 width=32)
-> Result (cost=0.00..5.25 rows=1000 width=32)
-> Sort (cost=17119731.21..17171983.43 rows=20900890 width=4)
Sort Key: data.elevation DESC
-> Seq Scan on data (cost=0.00..17015226.76 rows=20900890 width=4)
Filter: st_dwithin(location, $0, '600'::double precision)
(8 rows)So apparently it is now doing a sequential scan on data rather than using the index. And, of course, sorting 20 million rows is not trivial either. Did I do something wrong with forming the query?----------------------------------------------- Israel BrewsterSystems Analyst IIRavn Alaska5245 Airport Industrial RdFairbanks, AK 99709----------------------------------------------- On Thu, Jan 5, 2017 at 9:45 AM, Israel Brewster <israel@xxxxxxxxxxxxxx> wrote:I have a database (PostgreSQL 9.6.1) containing 62,702,675 rows of latitude (numeric), longitude(numeric), elevation(integer) data, along with a PostGIS (2.3.0) geometry column (location), running on a CentOS 6.8 box with 64GB RAM and a RAID10 SSD data drive. I'm trying to get the maximum elevation along a path, for which purpose I've come up with the following query (for one particular path example):SELECT elevation FROM data WHERE ST_DWithin(location, ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056 61.179167,-156.77 71.285833)'), 600) ORDER BY elevation LIMIT 1; The EXPLAIN ANALYZE output of this particular query (https://explain.depesz.com/s/heZ ) shows:QUERY PLAN
------------------------------------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------
Limit (cost=4.83..4.83 rows=1 width=4) (actual time=22653.840..22653.842 rows=1 loops=1)
-> Sort (cost=4.83..4.83 rows=1 width=4) (actual time=22653.837..22653.837 rows=1 loops=1)
Sort Key: elevation DESC
Sort Method: top-N heapsort Memory: 25kB
-> Index Scan using location_gix on data (cost=0.42..4.82 rows=1 width=4) (actual time=15.786..22652.041 rows=11081 loops=1)
Index Cond: (location && '0102000020E6100000020000002C11A8FE41C062C0DFC2BAF1EE964E407 13D0AD7A39863C086C77E164BD2514 0'::geography)
Filter: (('0102000020E6100000020000002C11A8FE41C062C0DFC2BAF1EE964E4 0713D0AD7A39863C086C77E164BD25 140'::geography && _st_expand(location, '600'::double precision)) AND _st_dwithin(location, '0102000 020E6100000020000002C11A8FE41C 062C0DFC2BAF1EE964E40713D0AD7A 39863C086C77E164BD25140'::geog raphy, '600'::double precision, true))
Rows Removed by Filter: 4934534
Planning time: 0.741 ms
Execution time: 22653.906 ms
(10 rows)So it is using the index properly, but still takes a good 22 seconds to run, most of which appears to be in the Index Scan.Is there any way to improve this, or is this going to be about as good as it gets with the number of rows being dealt with? I was planning to use this for a real-time display - punch in a couple of points, get some information about the route between, including maximum elevation - but with it taking 22 seconds for the longer routes at least, that doesn't make for the best user experience.It's perhaps worth noting that the example above is most likely a worst case scenario. I would expect the vast majority of routes to be significantly shorter, and I want to say the shorter routes query much faster [testing needed]. That said, the faster the better, even for short routes :-)----------------------------------------------- Israel BrewsterSystems Analyst IIRavn Alaska5245 Airport Industrial RdFairbanks, AK 99709-----------------------------------------------