On Wed, Sep 26, 2012 at 03:18:16PM -0600, Scott Marlowe wrote: > On Wed, Sep 26, 2012 at 5:50 AM, Ryan Kelly <rpkelly22@xxxxxxxxx> wrote: > > Hi: > > > > The size of our database is growing rather rapidly. We're concerned > > about how well Postgres will scale for OLAP-style queries over terabytes > > of data. Googling around doesn't yield great results for vanilla > > Postgres in this application, but generally links to other software like > > Greenplum, Netezza, and Aster Data (some of which are based off of > > Postgres). Too, there are solutions like Stado. But I'm concerned about > > the amount of effort to use such solutions and what we would have to > > give up feature-wise. > > If you want fastish OLAP on postgres you need to do several things. > > 1: Throw very fast disk arrays at it. Lots of spinners in a linux SW > RAID-10 or RAID-0 if your data is easily replaceable work wonders > here. > 2: Throw lots of memory at it. Memory is pretty cheap. 256G is not > unusual for OLAP machines > 3: Throw fast CPUs at it. Faster CPUs, especially fewer faster cores, > are often helpful. What do you mean by "fewer faster cores"? Wouldn't "more faster cores" be better? > Applied in that order you can get some pretty impressive results. > > A lot of OLAP stuff needs to read hundreds of gigs at a time from the > drive array. An array of 32 15kRPM drives, each reading at ~100MB/s > or better can flood your PCI bus at 3200MB/s for reads. Note that > most RAID controllers aren't as fast for sequential reads on large > drive arrays. Although a battery backed cache can GREATLY improved > parallel write speed, it doesn't really make a big difference for big > sequential stuff and usually gets in the way here. > > Memory to cache as much as possible and allow all your queries to do > hash joins etc in memory (crank up work_mem as needed, but be careful > not to use all your memory up.) > > Lastly once you've thrown lots of IO and memory at it, a faster CPU > can make a bit of a difference too. But honestly I'd rather have a > dual core 2GHz CPU on top of 32 spinners with 256G than a 3.6GHz 8 > core CPU on top of 4 drives and 32G of RAM. All of this seems like great advice. Thanks, -Ryan Kelly -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general