Re: Simple query, 10 million records...MySQL ten times faster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Folks,

we in astronomy permanently work with billiards objects with spherical
atributes and have several sky-indexing schemes. See my page
for links http://www.sai.msu.su/~megera/wiki/SkyPixelization

We have q3c package for PostgreSQL available from q3c.sf.net, which we use in production with terabytes-sized database.

Oleg
On Thu, 26 Apr 2007, Scott Marlowe wrote:

On Tue, 2007-04-24 at 16:26, zardozrocks wrote:
I have this table:

CREATE TABLE test_zip_assoc (
    id serial NOT NULL,
    f_id integer DEFAULT 0 NOT NULL,
    lat_radians numeric(6,5) DEFAULT 0.00000 NOT NULL,
    long_radians numeric(6,5) DEFAULT 0.00000 NOT NULL
);

Like someone else mentioned numeric types are SLOW.  See if you can use
integers, or at least floats.

I also wonder if you might be better served with geometric types and
GiST indexes on them than using your lat / long grid.  With a geometric
type, you could define the lat / long as a point and use geometric
operations with it.

See the pgsql manual:

http://www.postgresql.org/docs/8.1/static/datatype-geometric.html
http://www.postgresql.org/docs/8.1/static/functions-geometry.html

It's basically a table that associates some foreign_key (for an event,
for instance) with a particular location using longitude and
latitude.  I'm basically doing a simple proximity search.  I have
populated the database with *10 million* records.  I then test
performance by picking 50 zip codes at random and finding the records
within 50 miles with a query like this:

I assume that there aren't 10 million zip codes, right?

Are you storing the lat / long of the individual venues?  Or the zip
codes?  If you're storing the lat / long of the zips, then I can't
imagine there are 10 million zip codes.  If you could use the lat / long
numbers to find the zip codes that are in your range, then join that to
a venue table that fks off of the zip code table, I would think it would
be much faster, as you'd have a smaller data set to trundle through.

SELECT id
	FROM test_zip_assoc
	WHERE
		lat_radians > 0.69014816041
		AND lat_radians < 0.71538026567
		AND long_radians > -1.35446228028
		AND long_radians < -1.32923017502


On my development server (dual proc/dual core Opteron 2.8 Ghz with 4GB
ram) this query averages 1.5 seconds each time it runs after a brief
warmup period.  In PostGreSQL it averages about 15 seconds.

I wonder how well it would run if you had 10, 20, 30, 40 etc... users
running it at the same time.  My guess is that you'll be very lucky to
get anything close to linear scaling in any database.  That's because
this is CPU / Memory bandwidth intensive, so it's gonna kill your DB.
OTOH, if it was I/O bound you could throw more hardware at it (bb cache
RAID controller, etc)

Both of those times are too slow.  I need the query to run in under a
second with as many as a billion records.  I don't know if this is
possible but I'm really hoping someone can help me restructure my
indexes (multicolumn?, multiple indexes with a 'where' clause?) so
that I can get this running as fast as possible.

You're trying to do a whole lot of processing in a little time.  You're
either gonna have to accept a less exact answer (i.e. base it on zip
codes) or come up with some way of mining the data for the answers ahead
of time, kind of like a full text search for lat and long.

So, have you tried what I suggested about increasing shared_buffers and
work_mem yet?

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

               http://www.postgresql.org/about/donate


	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux