application of KNN code to US zipcode searches?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We perform over 1,000,000 searches each day for "adoptable shelter pets
near your zipcode". We already have adequate performance for these
searches using the "cube" contrib, but the new KNN work in 9.1 seemed
like it might be a promising way to speed this up even further.

I installed PostgreSQL 9.1 on my laptop to evaluate it, using this post
as a reference:
http://www.depesz.com/index.php/2010/12/11/waiting-for-9-1-knngist/

The first task was to translate a geo-spatial search to use the new KNN
syntax.

I'm most familiar with two approaches to geo-spatial searching with
PostgreSQL. The first is the older "earthdistance" approach, using
"point" types and the "<@>" operator.

The second is the one we are using now, which uses a cube type, the
"cube_distance()" and "earth_box()" method and a GIST index on the cube
type.

Immediately there is a hurdle in that KNN only appears to work with
point types and the <-> operator, which does simple point-to-point
distance, instead of the distance-around-the-earth. Still, I thought
that could be enough of an approximation to test the waters.

I started with some "real world" queries that involved some table joins,
and when those failed to show improvement, I worked with some
reduced-test-case queries.

While I could confirm the new GIST index was being used on the point
type, I couldn't get a query to benchmark better when it was invoked.
I'm wondering if perhaps US zipcode searches aren't good use of this
technology, perhaps because the data set is too small ( About 40,000
zipcodes ).

Given that we can already do GIST-indexed searches with the cube type
that provide good reasonable approximations for zipcode-radius searches,
are others planning to eventually apply the KNN work to US zipcode
searches?

Sample EXPLAIN output and query times are below.

    Mark

EXPLAIN ANALYZE SELECT zipcode,
    lon_lat <-> '(-118.412426,34.096629)' AS radius
    FROM zipcodes ;
-------------------------------------------
 Seq Scan on zipcodes  (cost=0.00..1257.54 rows=41483 width=22) (actual
time=0.019..84.543 rows=41483 loops=1)
 Total runtime: 148.129 ms


EXPLAIN ANALYZE SELECT zipcode,
    lon_lat <-> '(-118.412426,34.096629)' As radius
    FROM zipcodes
    ORDER BY lon_lat <-> '(-118.412426,34.096629)';
--------------------------------------------------
 Index Scan using zipcodes_knn on zipcodes  (cost=0.00..5365.93
rows=41483 width=22) (actual time=0.451..141.590 rows=41483 loops=1)
   Order By: (lon_lat <-> '(-118.412426,34.096629)'::point)
 Total runtime: 206.392 ms








-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux