Re: optimizing a geo_distance() proximity query

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Mark,

in astronomy we extensively use such kind of query, which we call
radial search or conesearch. There are several algorithms which perform
efficiently such query using spherical coordinates. Specifically, we use our Q3C algorithm, see
http://www.sai.msu.su/~megera/wiki/SkyPixelization for details,
which was designed for PostgreSQL and is freely available.

The paper is http://lnfm1.sai.msu.ru/~math/docs/adass_proceedings2005.pdf
Web site - http://q3c.sf.net/


Oleg

On Sat, 3 Feb 2007, Mark Stosberg wrote:


Hello,

I'm using geo_distance() from contrib/earthdistance would like to find a
way to spend up the geo distance calculation if possible. This is for a
proximity search: "Show me adoptable pets within 250 miles of this
zipcode".

I'm researched a number of approaches to this, but none seem as workable
as I would have hoped.

I read this claim [1] that Jobster used a lookup table of pre-calculated
distances between zipcodes...it took about 100 million rows.

1. http://bostonsteamer.livejournal.com/831325.html

I'd like to avoid that, but I think there's a sound concept in there: we
repeatedly making a complex calculation with the same inputs, and the
outputs are always the same.

The zipdy project [2] used some interesting approaches, both similar to
the large table idea. One variation involved a PL routine that would
look up the result in a cache table. If no result was found, it would
would compute the result and add it to the cache table. Besides
eventually creating millions of rows in the cache table, I tried this
technique and found it was much slower than using geo_distance() without
a cache. Another variation in the zipdy distribution just uses several
smaller cache tables, like one for "zipcodes 25 miles away" and
"zipcodes 50 miles away".  Equally unattractive.

2. http://www.cryptnet.net/fsp/zipdy/

I looked at doing the calculation outside of PostgreSQL, and passing in
the resulting list of zipcodes in an explicit IN() list. This seem
promising at first. Geo::Postalcode (Perl) could do the calculation in
5ms, which seemed to beat PostgreSQL. For a small proximity, I think
that combination might have performed better. However, some places have
close to 5,000 zipcodes within 250 files. I tried putting /that/
resulting list into an explicitly IN() clause, and it got much slower. :)

I did find there are some possible optimizations that can be made to the
Haversine algorithm itself. As this post pointed out [3], we could
pre-convert the lat/lon pair to radians, and also compute their sin()
and cos() values. However, the person suggesting this approach provided
no benchmarks to suggest it was worth it, and I have no evidence so far
that it matters either.

3.
http://www.voxclandestina.com/2006-09-01/optimizing-spatial-proximity-searches-in-sql/

What strikes me to consider at this point are a couple of options:

A. Explore a way add some memory caching or "memoizing" to
geo_distance() so it would hold on to frequently pre-computed values,
but without storing all the millions of possibilities.

B. Look at an alternate implementation. I suspect that given a small
enough radius and the relatively large size of zipcodes, a simpler
representation of the Earth's curvature could be used, with a sufficient
accuracy. Perhaps a cylinder, or even a flat projection... We currently
max out at 250 miles. ( I just discussed this option with my wife, the
math teacher. :)

Advice from other people who have deployed high-performance proximity
searches with PostgreSQL would be appreciated!

  Mark

















---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match


	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux