Re: Speed / Server

Merlin Moncure <mmoncure@xxxxxxxxx> · Tue, 6 Oct 2009 17:16:02 -0400

On Sun, Oct 4, 2009 at 6:45 PM,  <anthony@xxxxxxxxxxxxxx> wrote:
> All:
>
> We have a web-application which is growing ... fast.  We're currently
> running on (1) quad-core Xeon 2.0Ghz with a RAID-1 setup, and 8GB of RAM.
>
> Our application collects a lot of sensor data, which means that we have 1
> table which has about 8 million rows, and we're adding about 2.5 million
> rows per month.
>
> The problem is, this next year we're anticipating significant growth,
> where we may be adding more like 20 million rows per month (roughly 15GB
> of data).
>
> A row of data might have:
>  The system identifier (int)
>  Date/Time read (timestamp)
>  Sensor identifier (int)
>  Data Type (int)
>  Data Value (double)

One approach that can sometimes help is to use arrays to pack data.
Arrays may or may not work for the data you are collecting: they work
best when you always pull the entire array for analysis and not a
particular element of the array.  Arrays work well because they pack
more data into index fetches and you get to skip the 20 byte tuple
header.  That said, they are an 'optimization trade off'...you are
making one type of query fast at the expense of others.

In terms of hardware, bulking up memory will only get you so
far...sooner or later you have to come to terms with the fact that you
are dealing with 'big' data and need to make sure your storage can cut
the mustard.  Your focus on hardware upgrades should probably be size
and quantity of disk drives in a big raid 10.

Single user or 'small number of user'  big data queries tend to
benefit more from fewer core, fast cpus.

Also, with big data, you want to make sure your table design and
indexing strategy is as tight as possible.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance