Search Postgresql Archives

Re: General data warehousing questions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Scott Marlowe wrote:
On Sun, Oct 5, 2008 at 7:48 PM, Sean Davis <sdavis2@xxxxxxxxxxxx> wrote:
I am looking at the prospect of building a data warehouse of genomic
sequence data.  The machine that produces the data adds about
300million rows per month in a central fact table and we will
generally want the data to be "online".  We don't need instantaneous
queries, but we would be using the data for data mining purposes and
running some "real-time" queries for reporting and research purposes.
I have had the pleasure of working on an Netezza box where this type
of thing is quite standard, but we don't have that access anymore, so
I'm looking for hints on using postgres in a data warehousing/mining
environment.  Any suggestions on how DDL, loading, backup, indexing,
or (to a certain extent) hardware?

I assume you're familiar with stuff like star schemas.

For loading you might want to look at things like pg_bulkloader, copy,

For indexing remember that you have partial and fuctional indexes in
postgresql and they can come in quite handy.

For backup of large changing databases look into PITR.

As for hardware, you need enough CPU horsepower and memory to handle
however many users you're gonna have running simultaneous queries, but
more important is usually the drive subsystem.  Throwing drives,
battery backed cache and a good RAID controller can make a big
difference.  Usual RAID-10 is preferred, as writes are much faster. If
you're really squeezed for space and money then you can use RAID-5 but
it has some seriously negative performance implications for parallel
load handling and write speed.


You may also be interested in looking at BioPostgres
http://phenomics.cs.ucla.edu/



--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux