Re: application of KNN code to US zipcode searches?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Mark,

we investigating pgsphere http://pgsphere.projects.postgresql.org/, if we could add KNN support.


Oleg
On Thu, 17 Feb 2011, Mark Stosberg wrote:


I thought the benefit of KNN was that you could retrieve the rows in
distance order, so that a query for the closest 20 locations (for
example) would be very fast.  I wouldn't have expected it to be
helpful when you're selecting all the rows regardless of distance.

Kevin,

Thanks for the feedback. You are right that my "reduced test case"
wasn't a good approximation. I added a limit, to simulate finding the
100 zipcodes closest to 90210.

Below I compare 4 approaches to the same query:

1. Cube search
2. Earth Distance Search
3. Simple point distance (no index)
4. Simple point distance (KNN)

Now KNN benchmarks to be almost 100x faster! That's very promising.
Then there's only the issue that simple point distance is not expected
to be a good enough approximation of earth-distances. Perhaps that can
be solved by pre-computing coordinates based on the lat/long pairs....
much like the map projections used to present a curved surface on a flat
map? Given that's OK to be be a few miles off, it seems we have some
leeway here.

Recommendations?

   Mark

EXPLAIN ANALYZE
SELECT zipcode,
   cube_distance( '(-2513120.64361786, -4645511.0460328,
3575538.9507084)', zipcodes.earth_coords)/1609.344 AS radius
   FROM zipcodes ORDER BY radius LIMIT 100;

---------------------------------------------------------------
Limit  (cost=2946.70..2946.95 rows=100 width=62) (actual
time=167.650..168.064 rows=100 loops=1)
  ->  Sort  (cost=2946.70..3050.40 rows=41483 width=62) (actual
time=167.644..167.829 rows=100 loops=1)
        Sort Key: ((cube_distance('(-2513120.64361786,
-4645511.0460328, 3575538.9507084)'::cube, earth_coords) /
1609.344::double precision))
        Sort Method: top-N heapsort  Memory: 20kB
        ->  Seq Scan on zipcodes  (cost=0.00..1361.24 rows=41483
width=62) (actual time=0.030..90.807 rows=41483 loops=1)
Total runtime: 168.300 ms

############################################################3

-- Using Earthdistance
EXPLAIN ANALYZE SELECT zipcode,
   lon_lat <@> '(-118.412426,34.096629)' As radius
   FROM zipcodes
   ORDER BY lon_lat <@> '(-118.412426,34.096629)'
   LIMIT 100;

------------------------------------------------------------
Limit  (cost=2842.99..2843.24 rows=100 width=22) (actual
time=187.995..188.451 rows=100 loops=1)
  ->  Sort  (cost=2842.99..2946.70 rows=41483 width=22) (actual
time=187.989..188.149 rows=100 loops=1)
        Sort Key: ((lon_lat <@> '(-118.412426,34.096629)'::point))
        Sort Method: top-N heapsort  Memory: 20kB
        ->  Seq Scan on zipcodes  (cost=0.00..1257.54 rows=41483
width=22) (actual time=0.033..108.203 rows=41483 loops=1)
Total runtime: 188.660 ms

##########################################

Using simple point distance, but with no Gist Index:

EXPLAIN ANALYZE SELECT zipcode,
   lon_lat <-> '(-118.412426,34.096629)' As radius
   FROM zipcodes
   ORDER BY lon_lat <-> '(-118.412426,34.096629)'
   LIMIT 100;

--------------------------------------------------------
Limit  (cost=2842.99..2843.24 rows=100 width=22) (actual
time=160.574..161.057 rows=100 loops=1)
  ->  Sort  (cost=2842.99..2946.70 rows=41483 width=22) (actual
time=160.568..160.691 rows=100 loops=1)
        Sort Key: ((lon_lat <-> '(-118.412426,34.096629)'::point))
        Sort Method: top-N heapsort  Memory: 20kB
        ->  Seq Scan on zipcodes  (cost=0.00..1257.54 rows=41483
width=22) (actual time=0.027..84.610 rows=41483 loops=1)
Total runtime: 161.226 ms

#########################################

-- Using KNN-GIST index
EXPLAIN ANALYZE SELECT zipcode,
   lon_lat <-> '(-118.412426,34.096629)' As radius
   FROM zipcodes
   ORDER BY lon_lat <-> '(-118.412426,34.096629)'
   LIMIT 100;
------------------------------------------------------------------
Limit  (cost=0.00..12.94 rows=100 width=22) (actual time=0.447..1.892
rows=100 loops=1)
  ->  Index Scan using zipcodes_knn on zipcodes  (cost=0.00..5365.93
rows=41483 width=22) (actual time=0.440..1.407 rows=100 loops=1)
        Order By: (lon_lat <-> '(-118.412426,34.096629)'::point)
Total runtime: 2.198 ms




	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux