Re: best db schema for time series data?

Bob Lunney <bob_lunney@xxxxxxxxx> · Sat, 20 Nov 2010 08:27:51 -0800 (PST)

--- On Fri, 11/19/10, Robert Klemme <shortcutter@xxxxxxxxxxxxxx> wrote:

> From: Robert Klemme <shortcutter@xxxxxxxxxxxxxx>
> Subject: Re:  best db schema for time series data?
> To: pgsql-performance@xxxxxxxxxxxxxx
> Date: Friday, November 19, 2010, 7:16 PM
> On Fri, Nov 19, 2010 at 10:50 AM,
> Louis-David Mitterrand
> <vindex+lists-pgsql-performance@xxxxxxxxxxx>
> wrote:
> > On Tue, Nov 16, 2010 at 05:28:19PM +0100, Harald Fuchs
> wrote:
> >> In article <4CE2688B.2050000@xxxxxxxxxxxx>,
> >> Arjen van der Meijden <acmmailing@xxxxxxxxxxxx>
> writes:
> >>
> >> > On 16-11-2010 11:50, Louis-David Mitterrand
> wrote:
> >> >> I have to collect lots of prices from web
> sites and keep track of their
> >> >> changes. What is the best option?
> >> >>
> >> >> 1) one 'price' row per price change:
> >> >>
> >> >> create table price (
> >> >> id_price primary key,
> >> >> id_product integer references product,
> >> >> price integer
> >> >> );
> >> >>
> >> >> 2) a single 'price' row containing all
> the changes:
> >> >>
> >> >> create table price (
> >> >> id_price primary key,
> >> >> id_product integer references product,
> >> >> price integer[] -- prices are 'pushed' on
> this array as they change
> >> >> );
> >> >>
> >> >> Which is bound to give the best
> performance, knowing I will often need
> >> >> to access the latest and next-to-latest
> prices?
> >>
> >> > If you mostly need the last few prices, I'd
> definitaly go with the
> >> > first aproach, its much cleaner. Besides, you
> can store a date/time
> >> > per price, so you know when it changed. With
> the array-approach that's
> >> > a bit harder to do.
> >>
> >> I'd probably use a variant of this:
> >>
> >>   CREATE TABLE prices (
> >>     pid int NOT NULL REFERENCES products,
> >>     validTil timestamp(0) NULL,
> >>     price int NOT NULL,
> >>     UNIQUE (pid, validTil)
> >>   );
> >>
> >> The current price of a product is always the row
> with validTil IS NULL.
> >> The lookup should be pretty fast because it can
> use the index of the
> >> UNIQUE constraint.
> 
> Even better: with a partial index lookup should be more
> efficient and
> probably will stay that way even when the number of prices
> increases
> (and the number of products stays the same).  With
> 
> CREATE UNIQUE INDEX current_prices
> ON prices (
>   pid
> )
> WHERE validTil IS NULL;
> 
> I get
> 
> robert=> explain select price from prices where pid =
> 12344 and
> validTil is null;
>                
>              
>    QUERY PLAN
> -----------------------------------------------------------------------------
>  Index Scan using current_prices on prices 
> (cost=0.00..8.28 rows=1 width=4)
>    Index Cond: (pid = 12344)
> (2 rows)
> 
> The index can actually be used here.
> 
> (see attachment)
> 
> > The validTil idea is nice, but you have to manage that
> field with a
> > trigger, right?
> 
> Well, you don't need to.  You can always do
> 
> begin;
> update prices set validTil = current_timestamp
>   where pid = 123 and validTil is NULL;
> insert into prices values ( 123, null, 94 );
> commit;
> 
> But with a trigger it would be more convenient of course.
> 
> Kind regards
> 
> robert
> 
> -- 
> remember.guy do |as, often| as.you_can - without end
> http://blog.rubybestpractices.com/
> 
> -----Inline Attachment Follows-----
> 
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
> 

Louis,

Someday, as sure as Codd made little relational databases, someone will put an incorrect price in that table, and it will have to be changed, and that change will ripple throughout your system.  You have a unique chance here, at the beginning, to foresee that inevitability and plan for it.  

Take a look at 

  http://en.wikipedia.org/wiki/Temporal_database

and 

  http://pgfoundry.org/projects/temporal/

and anything Snodgrass ever wrote about temporal databases.  Its a fascinating schema design subject, one that comes in very handy in dealing with time-influenced data.

Good luck!

Bob Lunney

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance