> I thought the benefit of KNN was that you could retrieve the rows in > distance order, so that a query for the closest 20 locations (for > example) would be very fast. I wouldn't have expected it to be > helpful when you're selecting all the rows regardless of distance. Kevin, Thanks for the feedback. You are right that my "reduced test case" wasn't a good approximation. I added a limit, to simulate finding the 100 zipcodes closest to 90210. Below I compare 4 approaches to the same query: 1. Cube search 2. Earth Distance Search 3. Simple point distance (no index) 4. Simple point distance (KNN) Now KNN benchmarks to be almost 100x faster! That's very promising. Then there's only the issue that simple point distance is not expected to be a good enough approximation of earth-distances. Perhaps that can be solved by pre-computing coordinates based on the lat/long pairs.... much like the map projections used to present a curved surface on a flat map? Given that's OK to be be a few miles off, it seems we have some leeway here. Recommendations? Mark EXPLAIN ANALYZE SELECT zipcode, cube_distance( '(-2513120.64361786, -4645511.0460328, 3575538.9507084)', zipcodes.earth_coords)/1609.344 AS radius FROM zipcodes ORDER BY radius LIMIT 100; --------------------------------------------------------------- Limit (cost=2946.70..2946.95 rows=100 width=62) (actual time=167.650..168.064 rows=100 loops=1) -> Sort (cost=2946.70..3050.40 rows=41483 width=62) (actual time=167.644..167.829 rows=100 loops=1) Sort Key: ((cube_distance('(-2513120.64361786, -4645511.0460328, 3575538.9507084)'::cube, earth_coords) / 1609.344::double precision)) Sort Method: top-N heapsort Memory: 20kB -> Seq Scan on zipcodes (cost=0.00..1361.24 rows=41483 width=62) (actual time=0.030..90.807 rows=41483 loops=1) Total runtime: 168.300 ms ############################################################3 -- Using Earthdistance EXPLAIN ANALYZE SELECT zipcode, lon_lat <@> '(-118.412426,34.096629)' As radius FROM zipcodes ORDER BY lon_lat <@> '(-118.412426,34.096629)' LIMIT 100; ------------------------------------------------------------ Limit (cost=2842.99..2843.24 rows=100 width=22) (actual time=187.995..188.451 rows=100 loops=1) -> Sort (cost=2842.99..2946.70 rows=41483 width=22) (actual time=187.989..188.149 rows=100 loops=1) Sort Key: ((lon_lat <@> '(-118.412426,34.096629)'::point)) Sort Method: top-N heapsort Memory: 20kB -> Seq Scan on zipcodes (cost=0.00..1257.54 rows=41483 width=22) (actual time=0.033..108.203 rows=41483 loops=1) Total runtime: 188.660 ms ########################################## Using simple point distance, but with no Gist Index: EXPLAIN ANALYZE SELECT zipcode, lon_lat <-> '(-118.412426,34.096629)' As radius FROM zipcodes ORDER BY lon_lat <-> '(-118.412426,34.096629)' LIMIT 100; -------------------------------------------------------- Limit (cost=2842.99..2843.24 rows=100 width=22) (actual time=160.574..161.057 rows=100 loops=1) -> Sort (cost=2842.99..2946.70 rows=41483 width=22) (actual time=160.568..160.691 rows=100 loops=1) Sort Key: ((lon_lat <-> '(-118.412426,34.096629)'::point)) Sort Method: top-N heapsort Memory: 20kB -> Seq Scan on zipcodes (cost=0.00..1257.54 rows=41483 width=22) (actual time=0.027..84.610 rows=41483 loops=1) Total runtime: 161.226 ms ######################################### -- Using KNN-GIST index EXPLAIN ANALYZE SELECT zipcode, lon_lat <-> '(-118.412426,34.096629)' As radius FROM zipcodes ORDER BY lon_lat <-> '(-118.412426,34.096629)' LIMIT 100; ------------------------------------------------------------------ Limit (cost=0.00..12.94 rows=100 width=22) (actual time=0.447..1.892 rows=100 loops=1) -> Index Scan using zipcodes_knn on zipcodes (cost=0.00..5365.93 rows=41483 width=22) (actual time=0.440..1.407 rows=100 loops=1) Order By: (lon_lat <-> '(-118.412426,34.096629)'::point) Total runtime: 2.198 ms -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance