Search Postgresql Archives

Re: Two billion records ok?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nick Bower wrote:
We're considering using Postgresql for storing gridded metadata - each point of our grids has a variety of metadata attached to it (including lat/lon, measurements, etc) and would constitute a record in Postgresql+Postgis.

Size-wise, grids are about 4000x700 and are collected twice daily over say 10 years. As mentioned, each record would have up to 50 metadata attributes (columns) including geom, floats, varchars etc.

So given 4000x700x2x365x10 > 2 billion, is this going to be a problem if we will be wanting to query on datetimes, Postgis lat/lon, and integer-based metadata flags?

Hmmmm... 2 billion looks optimistic... I get 2.044e+10, which is 20 billion.

I'm currently working with a "table" of over 200,000,000 records. With a clustered index & underlying partitioning, response times are more than acceptable (a 25 wide self relation (left outer join) for 3 months data (records are around 40/minute) is about 2 minutes. Simple query with a where clause on timestamp is a few secs at most. This is on a 32 bit Intel system with only 2Gb memory & mirrored 7200RPM SATA hard drives.

I'd suggest partition on timestamp, maybe per year at least, & use a clustered index on timestamp. It should be viable if your hardware is up to it.

I'd also strongly recommend a suitable platform, 64 bit Linux on AMD 64 or Opteron with as much memory & the fastest RAID setup you can afford. Make sure you use a fully 64 bit version of Postgres/Postgis on this platform as well.

If the same grid is being resampled, then a separate table defining the grid, and a join on grid ID to the main (partitioned) table may improve performance (& reduce data volume).

I wouldn't expect instantaneous answers from it, but don't know of any reason it won't work. Depends very much on what level of performance is considered acceptable.

I'm also using Postgis grids with up to a few million cells and spatial joins to a millions or so tracklines to generate gridded models for analysis in R. You seem to be scaling this up from where I'm at, so I'd like to know how you get on..


Cheers,

  Brent Wood



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux