Hi All, I have a large quantity of temporal data, 6 billion rows, which I would like to put into a table so I can exploit SQL datetime queries. Each row represents a geophysical observation at a particular time and place. The data is effectively read-only - i.e. very infrequent updates will be performed. The rows are very 'narrow' (~24bytes of data per row). When I ingest each data into PostgreSQL a row at a time I discovered that the row over-head is significant (pg 8.3.7). The projected resources required to host this table prohibit this simple approach. In order to reduce the cost of the row over head, I tried storing a whole minutes worth of data in an array, and now I only require one row per minute. Total rows decreased by 60, resources required became realistic. My schema is thus: CREATE TABLE geodata1sec (obstime TIMESTAMP WITHOUT TIME ZONE NOT NULL, statid SMALLINT NOT NULL, geovalue_array REAL[3][60] NOT NULL); and after ingesting, I add these indexes: ALTER TABLE geodata1sec ADD PRIMARY KEY (obstime, statid); CREATE INDEX geodata1sec_statid_idx ON geodata1sec (statid); Storing whole minutes in a row with the data in an array has the desired effect of making the table size on disk, and index size in memory, manageable. However, my queries now need to be sensitive that I've made this schema design decision. The following query runs nice and quick but obviously doesn't return all the relevant results (because second resolution is specified): EXPLAIN ANALYZE SELECT * FROM geodata1sec WHERE obstime BETWEEN '2004-10-21 02:03:04' AND '2004-10-21 02:04:08'; QUERY PLAN -------------------------------------------------------------------------- Index Scan using geodata1sec_pkey on geodata1sec (cost=0.00..38.19 rows=12 width=762) (actual time=0.071..0.148 rows=13 loops=1) Index Cond: ((obstime >= '2004-10-21 02:03:04'::timestamp without time zone) AND (obstime <= '2004-10-21 02:04:08'::timestamp without time zone)) Total runtime: 0.292 ms (3 rows) ... So, I constructed a view which would present my data as I originally intended. This also means that I don't have to give my applications detailed knowledge of the schema. The view is: CREATE VIEW geodataview AS SELECT obstime + (s.a*5 || ' seconds')::INTERVAL AS obstime, statid, geovalue_array[s.a+1][1] AS x_mag, geovalue_array[s.a+1][2] AS y_mag, geovalue_array[s.a+1][3] AS z_mag FROM generate_series(0, 11) AS s(a), geodata1sec; So my query returns _all_ the relevant data. However, this query takes a long time. If I analyse the query I get: EXPLAIN ANALYZE SELECT * FROM geodataview WHERE obstime BETWEEN '2004-10-21 02:03:04' AND '2004-10-21 02:04:08'; QUERY PLAN -------------------------------------------------------------------------- Nested Loop (cost=13.50..2314276295.50 rows=4088000000 width=766) (actual time=2072612.668..3081010.104 rows=169 loops=1) Join Filter: (((geodata1sec.obstime + ((((s.a * 5))::text || ' seconds'::text))::interval) >= '2004-10-21 02:03:04'::timestamp without time zone) AND ((geodata1sec.obstime + ((((s.a * 5))::text || ' seconds'::text))::interval) <= '2004-10-21 02:04:08'::timestamp without time zone)) -> Seq Scan on geodata1sec (cost=0.00..4556282.00 rows=36792000 width=762) (actual time=17.072..414620.213 rows=36791999 loops=1) -> Materialize (cost=13.50..23.50 rows=1000 width=4) (actual time=0.002..0.027 rows=12 loops=36791999) -> Function Scan on generate_series s (cost=0.00..12.50 rows=1000 width=4) (actual time=0.075..0.102 rows=12 loops=1) Total runtime: 3081010.613 ms (6 rows) This is clearly not going to perform for any practical applications. However, it struck me that others might have needed similar functionality for time data so I thought I would air my experience here. Is it feasible to modify the query planner to make better decisions when dealing with time data behind a view? Are there any alternatives to vanilla Postgresql for storing this type of data? I'm imagining PostGIS but for time based data? Your time and thoughts are appreciated, Cheers, Richard -- Scanned by iCritical. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance