Search Postgresql Archives

Re: PostgreSQL, OLAP, and Large Clusters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 26, 2012 at 03:18:16PM -0600, Scott Marlowe wrote:
> On Wed, Sep 26, 2012 at 5:50 AM, Ryan Kelly <rpkelly22@xxxxxxxxx> wrote:
> > Hi:
> >
> > The size of our database is growing rather rapidly. We're concerned
> > about how well Postgres will scale for OLAP-style queries over terabytes
> > of data. Googling around doesn't yield great results for vanilla
> > Postgres in this application, but generally links to other software like
> > Greenplum, Netezza, and Aster Data (some of which are based off of
> > Postgres). Too, there are solutions like Stado. But I'm concerned about
> > the amount of effort to use such solutions and what we would have to
> > give up feature-wise.
> 
> If you want fastish OLAP on postgres you need to do several things.
> 
> 1: Throw very fast disk arrays at it.  Lots of spinners in a linux SW
> RAID-10 or RAID-0 if your data is easily replaceable work wonders
> here.
> 2: Throw lots of memory at it.  Memory is pretty cheap.  256G is not
> unusual for OLAP machines
> 3: Throw fast CPUs at it.  Faster CPUs, especially fewer faster cores,
> are often helpful.
What do you mean by "fewer faster cores"? Wouldn't "more faster cores"
be better?

> Applied in that order you can get some pretty impressive results.
> 
> A lot of OLAP stuff needs to read hundreds of gigs at a time from the
> drive array.  An array of 32 15kRPM drives, each reading at ~100MB/s
> or better can flood your  PCI bus at 3200MB/s for reads.  Note that
> most RAID controllers aren't as fast for sequential reads on large
> drive arrays.  Although a battery backed cache can GREATLY improved
> parallel write speed, it doesn't really make a big difference for big
> sequential stuff and usually gets in the way here.
> 
> Memory to cache as much as possible and allow all your queries to do
> hash joins etc in memory (crank up work_mem as needed, but be careful
> not to use all your memory up.)
> 
> Lastly once you've thrown lots of IO and memory at it, a faster CPU
> can make a bit of a difference too.  But honestly I'd rather have a
> dual core 2GHz CPU on top of 32 spinners with 256G than a 3.6GHz 8
> core CPU on top of 4 drives and 32G of RAM.
All of this seems like great advice.

Thanks,
-Ryan Kelly


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux