> > > > I have two tables that are georeferenced (although in this case > I'm > > > not using PostGIS) that I need to join. > > > > A ( lat | lon | depth | value) > > > > |A| = 1,100,000 > > > > > > > > B ( lat | lon | attributes) > > > > |B| = 14,000,000 > > > > > > > > A is a special case because the lat / lon values are all at half > > > degree intervals (ie 33.5, -12.5). The lat/lons in B are arbitrary. > > > > I've written a function in B called getSpecialLat(latitude) and > > > getSpecialLon(longitude) to calculate the correct A latitude and > built > > > an index on both functions. > > > > > > > > Here's the query that I'm trying, but it's rather slow: > > > > SELECT B.* FROM B, > > > > (SELECT lat, lon FROM A WHERE value > 0 AND value < 2 AND depth = > 0) > > > AS foo > > > > WHERE getSpecialLat(B.lat) = foo.lat AND getSpecialLon(B.lon) = > > > foo.lon > > > > > > > > "Nested Loop (cost=3569.88..32055.02 rows=1414 width=422)" > > > > " -> Index Scan using A_valueidx on A (cost=0.00..555.26 > rows=6 width=16)" > > > > " Index Cond: ((value > 0) AND (value < 2))" > > > > " Filter: (depth = 0)" > > > > " -> Bitmap Heap Scan on B (cost=3569.88..5029.48 rows=424 > width=422)" > > > > " Recheck Cond: ((getSpecialLon((B.lon)::numeric) = A.lon) > AND (getSpecialLat((B.lat)::numeric) = A.lat))" > > > > " -> BitmapAnd (cost=3569.88..3569.88 rows=424 width=0)" > > > > " -> Bitmap Index Scan on Blonidx > (cost=0.00..1760.38 rows=84859 width=0)" > > > > " Index Cond: (getSpecialLon((B.lon)::numeric) > = A.lon)" > > > > " -> Bitmap Index Scan on Blatidx > (cost=0.00..1766.81 rows=84859 width=0)" > > > > " Index Cond: > (getSpeicalLat((B.latitude)::numeric) = A.lat)" > > > > > > > > Am I missing something in terms of speeding up this query? > > > > > > I'd be interested to see if the query rewritten as a JOIN would be > > > faster. > > > > I can write it like this: > > select b.* > > from b join a on (getwoalatitude(b.latitude::numeric) = a.lat > > and getwoalongitude(b.longitude::numeric) = a.lon) > > where > > a.value > 0 and a.value < 2 and a.depth = 0 > > > > which results in this plan: > > "Nested Loop (cost=1387.20..13152982.35 rows=1625767 width=422)" > > " -> Index Scan using a_depthidx on a_(cost=0.00..1464.07 rows=6897 > width=16)" > > " Index Cond: (depth = 0)" > > " Filter: ((value > 0::numeric) AND (value < 2::numeric))" > > " -> Bitmap Heap Scan on b (cost=1387.20..1686.37 rows=424 > width=422)" > > " Recheck Cond: ((getSpecialLon((b.lon)::numeric) = a.lon) AND > (getSpecialLat((b.lat)::numeric) = a.lat))" > > " -> BitmapAnd (cost=1387.20..1387.20 rows=424 width=0)" > > " -> Bitmap Index Scan on Blonidx (cost=0.00..672.15 > rows=84859 width=0)" > > " Index Cond: (getSpecialLon((b.lon)::numeric) = > a.lon)" > > " -> Bitmap Index Scan on Blatidx (cost=0.00..672.36 > rows=84859 width=0)" > > " Index Cond: (getSpecialLat((b.lat)::numeric) = > a.lat)" > > > > However it's still taking ages to execute (over five minutes - I > stopped it before it finished) > > Do you really expect that query to return 1.6M rows? I doubt it, since > the subselect version only returns 1400. No - I do not expect it to return 1.6M... > If you do a vacuum analyze on > those two tables, does it speed either of the queries up? Here are the new results (with explain analyze): EXPLAIN ANALYZE SELECT B.* FROM B, (SELECT lat, lon FROM A WHERE value > 0 AND value < 2 AND depth = 0) AS foo WHERE getwoalatitude(B.latitude::numeric) = foo.lat AND getwoalongitude(B.longitude::numeric) = foo.lon after vacuum analyze: "Merge Join (cost=17873237.91..26594735.94 rows=1984509 width=420) (actual time=752983.201..941125.197 rows=226941 loops=1)" " Merge Cond: ((a.lat = (getwoalatitude((b.latitude)::numeric))) AND (a.lon = (getwoalongitude((b.longitude)::numeric))))" " -> Sort (cost=2019.51..2037.61 rows=7239 width=16) (actual time=30.704..32.171 rows=2111 loops=1)" " Sort Key: a.lat, a.lon" " Sort Method: quicksort Memory: 212kB" " -> Index Scan using depthidx on a (cost=0.00..1555.43 rows=7239 width=16) (actual time=0.533..24.631 rows=2111 loops=1)" " Index Cond: (depth = 0)" " Filter: ((value > 0::numeric) AND (value < 2::numeric))" " -> Materialize (cost=17871190.21..18083338.96 rows=16971900 width=420) (actual time=751324.751..919278.574 rows=16963350 loops=1)" " -> Sort (cost=17871190.21..17913619.96 rows=16971900 width=420) (actual time=751324.744..820522.604 rows=16963350 loops=1)" " Sort Key: (getwoalatitude((b.latitude)::numeric)), (getwoalongitude((b.longitude)::numeric))" " Sort Method: external merge Disk: 4599344kB" " -> Seq Scan on b (cost=0.00..750696.00 rows=16971900 width=420) (actual time=1.781..229158.949 rows=16971901 loops=1)" "Total runtime: 942295.914 ms" EXPLAIN ANALYZE SELECT b.* FROM b JOIN a ON (getwoalatitude(b.latitude::numeric) = a.lat AND getwoalongitude(b.longitude::numeric) = a.lon) WHERE a.value > 0 AND a.value < 2 AND a.depth = 0 "Merge Join (cost=17873237.91..26594735.94 rows=1984509 width=420) (actual time=755255.801..906407.961 rows=226941 loops=1)" " Merge Cond: ((a.lat = (getwoalatitude((b.latitude)::numeric))) AND (a.lon = (getwoalongitude((b.longitude)::numeric))))" " -> Sort (cost=2019.51..2037.61 rows=7239 width=16) (actual time=27.658..29.108 rows=2111 loops=1)" " Sort Key: a.lat, a.lon" " Sort Method: quicksort Memory: 212kB" " -> Index Scan using depthidx on a (cost=0.00..1555.43 rows=7239 width=16) (actual time=0.467..21.646 rows=2111 loops=1)" " Index Cond: (depth = 0)" " Filter: ((value > 0::numeric) AND (value < 2::numeric))" " -> Materialize (cost=17871190.21..18083338.96 rows=16971900 width=420) (actual time=753605.414..884549.890 rows=16963350 loops=1)" " -> Sort (cost=17871190.21..17913619.96 rows=16971900 width=420) (actual time=753605.407..822844.299 rows=16963350 loops=1)" " Sort Key: (getwoalatitude((b.latitude)::numeric)), (getwoalongitude((b.longitude)::numeric))" " Sort Method: external merge Disk: 4599344kB" " -> Seq Scan on b (cost=0.00..750696.00 rows=16971900 width=420) (actual time=0.095..229888.646 rows=16971901 loops=1)" "Total runtime: 911284.022 ms" > Try "set enable_nestloop=off" and rerun the two queries. If that helps > and analyze didn't then perhaps you need to increase the statistics > target on those two tables, or perhaps you'll just have to use that > set command to force the planner to avoid a nested loop. > > Hope one of these helps. Thank, Bill, it's a bit faster. Any other thoughts? -Chris -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general