On Wed, Sep 26, 2012 at 5:50 AM, Ryan Kelly <rpkelly22@xxxxxxxxx> wrote: > Hi: > > The size of our database is growing rather rapidly. We're concerned > about how well Postgres will scale for OLAP-style queries over terabytes > of data. Googling around doesn't yield great results for vanilla > Postgres in this application, but generally links to other software like > Greenplum, Netezza, and Aster Data (some of which are based off of > Postgres). Too, there are solutions like Stado. But I'm concerned about > the amount of effort to use such solutions and what we would have to > give up feature-wise. If you want fastish OLAP on postgres you need to do several things. 1: Throw very fast disk arrays at it. Lots of spinners in a linux SW RAID-10 or RAID-0 if your data is easily replaceable work wonders here. 2: Throw lots of memory at it. Memory is pretty cheap. 256G is not unusual for OLAP machines 3: Throw fast CPUs at it. Faster CPUs, especially fewer faster cores, are often helpful. Applied in that order you can get some pretty impressive results. A lot of OLAP stuff needs to read hundreds of gigs at a time from the drive array. An array of 32 15kRPM drives, each reading at ~100MB/s or better can flood your PCI bus at 3200MB/s for reads. Note that most RAID controllers aren't as fast for sequential reads on large drive arrays. Although a battery backed cache can GREATLY improved parallel write speed, it doesn't really make a big difference for big sequential stuff and usually gets in the way here. Memory to cache as much as possible and allow all your queries to do hash joins etc in memory (crank up work_mem as needed, but be careful not to use all your memory up.) Lastly once you've thrown lots of IO and memory at it, a faster CPU can make a bit of a difference too. But honestly I'd rather have a dual core 2GHz CPU on top of 32 spinners with 256G than a 3.6GHz 8 core CPU on top of 4 drives and 32G of RAM. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general