Re: very large tables

Alban Hertroys <dalroi@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> · Mon, 25 May 2009 13:17:27 +0200

On May 25, 2009, at 10:58 AM, Ramiro Diaz Trepat wrote:

The table with the atmosphere pixels, currently has about 140MM  
records, and the one the values about 1000MM records.   They should  
both grow to about twice this size.

Did you tune postgres to use the available resources? By default it  
comes with rather modest settings so it will run on low-spec hardware  
without trouble. For a dataset this size modest hardware is obviously  
out of the question, and so the default config will likely not be  
suitable.

A simple query with no joins like this:

select trajectory from atmospherepoint where moment='1979-05-02  
11:45:00' and height >= 12000 and height <= 14000 and longitude >=  
180 and longitude <= 190 and latitude >= 0 and latitude <= 10;

I think the BETWEEN operator would make that query a bit easier on the  
eyes, like so:

select trajectory from atmospherepoint where moment='1979-05-02  
11:45:00' and height between 12000 and 14000 and longitude between 180  
and 190 and latitude between 0 and 10;

is taking almost 4 minutes in a decent multi core server.  Also, the  
moment equality test should yield no more than 2MM rows, so I  
thought that should make things a lot faster.

The explain returns the suspicious "Seq Scan" up front:

Seq Scan on atmospherepoint  (cost=0.00..5461788.08 rows=134 width=8)
   Filter: ((moment = '1979-05-02 11:45:00'::timestamp without time  
zone) AND (height >= 12000::double precision) AND (height <=  
14000::double precision) AND (longitude >= 180::double precision)  
AND (longitude <= 190::double precision) AND (latitude >= 0::double  
precision) AND (latitude <= 10::double precision))

That query is expecting 134 rows, how accurate is that? An explain  
analyse will show you that (and will take 4 minutes in this case).

I think you'll need lots of memory if you want to keep those indexes  
in memory. If I do a rough byte-count of the datatypes involved times  
the number of records I think just the index data (w/o metadata) takes  
up 40B * 140M rows = 5.6GB. Scanning an index on disk is likely to be  
slow, which may well be one of the reasons PG decides on a sequential  
scan.

What is the distribution of the data you're querying for? If it's all  
over the place then the planner made a pretty good guess a sequential  
scan is most optimal; random disk I/O isn't very optimal, sequential I/ 
O is usually much faster.

Before solving that you'll want to figure out whether this is a  
typical query, or that distributing the data differently will hurt  
other queries. If it's a typical query, then clustering your data on  
an index and/or partitioning will help. If it's not, it still will  
help this type of query but it may hurt other types of queries. You'll  
want to optimise into the right direction.

As other people said already, partitioning is likely to help here.  
Your data seems very suitable for partitioning too. I think I'd divide  
the coordinates in a number of latitude/longitude slices (a grid seems  
suitable to me, but I don't deal with atmospheric data). With 32  
slices you'll have <500k records per slice on average, whether that's  
sufficiently small you'll have to test.

Alternatively, if your data is clustered (on disk) by say longitude/ 
latitude it may already help to define partial indexes on longitude  
latitude ranges, for example:

CREATE INDEX atm_section1_idx (longitude, latitude, height) WHERE  
longitude BETWEEN 180 AND 190 AND latitude BETWEEN 0 AND 10;

This is very similar to partitioning your data, but it doesn't require  
splitting the data up across multiple partitions (tables). This could  
be faster if you would have queries spanning multiple table- 
partitions, but if your data isn't clustered appropriately finding it  
on disk will require more random I/O than a partitioned layout.
Choices choices... The only way to find out what works best is to  
test, I'm afraid.

but I have created indices for every column in the table and I have  
also runned ANALIZE and VACUUM:

           Table "public.atmospherepoint2"
   Column   |            Type             | Modifiers
------------+-----------------------------+-----------
 id         | integer                     | not null
 trajectory | integer                     | not null
 moment     | timestamp without time zone | not null
 longitude  | real                        | not null
 latitude   | real                        | not null
 height     | real                        | not null
Indexes:
    "atmospherepoint2_pkey" PRIMARY KEY, btree (id)
    "atm_height_idx" btree (height)
    "atm_latitude_idx" btree (latitude)
    "atm_longitude_idx" btree (longitude)
    "atm_moment_idx" btree (moment)
    "atm_trajectory_idx" btree (trajectory)
Foreign-key constraints:
    "atmospherepoint2_trajectory_fkey" FOREIGN KEY (trajectory)  
REFERENCES trajectory2(id)

You do realise that floating point arithmetic isn't accurate, do you?  
If that matters to you you're probably better off using the numeric  
type, although apparently FP arithmetic is faster.

That aside, you're probably better off with a postgis type for the co- 
ordinates, as that's designed for this kind of data.

I will welcome a lot any advice or pointer on how to tune these  
tables to work faster.
The tables don't change at all once the data has been loaded, they  
are only used for queries.
Thank you very much.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4a1a7e4d10092128944961!

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general