Re: Speed / Server

<richard.henwood@xxxxxxxxxx> · Wed, 7 Oct 2009 09:40:39 +0100

> -----Original Message-----
<snip>
> >
> > The problem is, this next year we're anticipating significant growth,
> > where we may be adding more like 20 million rows per month (roughly
> 15GB
> > of data).
> >
> > A row of data might have:
> >  The system identifier (int)
> >  Date/Time read (timestamp)
> >  Sensor identifier (int)
> >  Data Type (int)
> >  Data Value (double)
> 
> One approach that can sometimes help is to use arrays to pack data.
> Arrays may or may not work for the data you are collecting: they work
> best when you always pull the entire array for analysis and not a
> particular element of the array.  Arrays work well because they pack
> more data into index fetches and you get to skip the 20 byte tuple
> header.  That said, they are an 'optimization trade off'...you are
> making one type of query fast at the expense of others.
> 

I recently used arrays for a 'long and thin' table very like those
described here. The tuple header became increasingly significant in our
case. There are some details in my post:

http://www.nabble.com/optimizing-for-temporal-data-behind-a-view-td25490818.html

As Merlin points out: one considerable side-effect of using arrays 
is that it reduces the sort of queries which we could perform - 
i.e. querying data is was in an array becomes costly. 
So, we needed to make sure our user scenarios were (requirements) 
were well understood.

richard

--
Scanned by iCritical.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance