On Mon, Feb 10, 2014 at 8:45 AM, Mark Wong <markwkm@xxxxxxxxx> wrote: > Hello everybody, > > I was wondering if anyone had any experiences they can share when > designing the time dimension for a star schema and the like. I'm > curious about how well it would work to use a timestamp for the > attribute key, as opposed to a surrogate key, and populating the time > dimension with triggers on insert to the fact tables. This is > something that would have data streaming in (as oppose to bulk > loading) and I think we want time granularity to the minute. > > A simplified example: > > -- Time dimension > CREATE TABLE time ( > datetime TIMESTAMP WITH TIME ZONE NOT NULL, > day_of_week SMALLINT NOT NULL > ); > CREATE UNIQUE INDEX ON time (datetime); > > -- Fact > CREATE TABLE fact( > datetime TIMESTAMP WITH TIME ZONE NOT NULL, > FOREIGN KEY (datetime) REFERENCES time(datetime) > ); > > -- Function to populate the time dimension > CREATE OR REPLACE FUNCTION decompose_timestamp() RETURNS TRIGGER AS $$ > BEGIN > NEW.datetime = date_trunc('minutes', NEW.datetime); > INSERT INTO time (datetime, day_of_week) > VALUES (NEW.datetime, date_part('dow', NEW.datetime)); > RETURN NEW; > EXCEPTION > WHEN unique_violation THEN > -- Do nothing if the timestamp already exists in the dimension table. > RETURN new; > END; $$ > LANGUAGE 'plpgsql'; Failure to inline the date/time in dimension tables a terrible practice IMO. You add a lookup and an expensive subtransaction to each insert. When querying the fact table you tack on a join for every query that does not need to be there (for no benefit I can see). merlin -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general