On Sun, Oct 4, 2009 at 6:45 PM, <anthony@xxxxxxxxxxxxxx> wrote: > All: > > We have a web-application which is growing ... fast. We're currently > running on (1) quad-core Xeon 2.0Ghz with a RAID-1 setup, and 8GB of RAM. > > Our application collects a lot of sensor data, which means that we have 1 > table which has about 8 million rows, and we're adding about 2.5 million > rows per month. > > The problem is, this next year we're anticipating significant growth, > where we may be adding more like 20 million rows per month (roughly 15GB > of data). > > A row of data might have: > The system identifier (int) > Date/Time read (timestamp) > Sensor identifier (int) > Data Type (int) > Data Value (double) One approach that can sometimes help is to use arrays to pack data. Arrays may or may not work for the data you are collecting: they work best when you always pull the entire array for analysis and not a particular element of the array. Arrays work well because they pack more data into index fetches and you get to skip the 20 byte tuple header. That said, they are an 'optimization trade off'...you are making one type of query fast at the expense of others. In terms of hardware, bulking up memory will only get you so far...sooner or later you have to come to terms with the fact that you are dealing with 'big' data and need to make sure your storage can cut the mustard. Your focus on hardware upgrades should probably be size and quantity of disk drives in a big raid 10. Single user or 'small number of user' big data queries tend to benefit more from fewer core, fast cpus. Also, with big data, you want to make sure your table design and indexing strategy is as tight as possible. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance