Re: geographic search engine

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Brian Dunning wrote:
> On Jan 11, 2005, at 12:39 PM, Greg Donald wrote:
>
>> This does not take into account the curve of the earth.  In addition
>> you'll need a db with the latitude and longitude for each zip code.

If you only want "close" things to a given location, the curvature of the
earth is pretty insignificant.

I used to have that long-ass trigonometric formula in a user-defined
PostgreSQL function so I could implement geogrphical search here:
http://chatmusic.com/venues.htm#search

I realized one day that at a distance of a hundred miles or less, I just
didn't *CARE* about curvature of the earth, and replaced that trig with
your basic Cartesian distance.

Made a *HUGE* difference in performance.

It always amazes me that I was so foolish as to listen to all the
"experts" who told me I needed this really hairy complex thing when, in
reality, what I needed was something MUCH simpler.

Oh, and another Free Tip:

Once you load in the 65000+ zip codes for the US (plus whatever other
countries you need/find) you'll be doing a JOIN with your search table
with that.

65000 X (number of records in your table) ====> WAY TOO MANY RECORDS!!!

So it's time to break the cardinal rule of good database design.

Sort of.

Sanely.

Put a longitude and latitude column on your existing table, and default it
to NULL.

Write your business logic so that *ANY* time a zip code is changed, the
longitude/latitude is re-set to NULL.  (Or use triggers or whatever you
want to make this happen.  I don't care.)

Finally, write a cron job (scheduled task in Windoze) to find N records at
random in your table where the long/lat is NULL, and *copy* over the
long/lat from the zips table.

Now, your search only has to deal with however many records are in your
table.  Not 65000 *times* that many.

You're breaking the "rules" of database design in a sensible, maintainable
way for a HUGE performance gain.

Before I figured this out, I must have brought my server to a crawl I
don't know how many times with a perfectly "reasonable" query...  that
involved millions of intermediary tuples when I really only expected a
dozen to actually come out in the end.

PPS You can pay $$$ for the "complete" databases of zips, or use the TIGER
data for free just interpolate from existing entries to make up long/lats
for new zips.  EG:  If tomorrow the USPS creates zip code 60609, I can be
pretty damn sure it's "close enough" to 60601 through 60608 and just
average them to make up bogus long/lat.  Sure, it's "wrong".  It's also
FREE and "close enough" for what I (and almost for sure you) are doing.

-- 
Like Music?
http://l-i-e.com/artists.htm

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux