Andrew Smith <laconical@xxxxxxxxx> writes: > I'm a beginner when it comes to Postgresql, and have a table design question > about a project I'm currently working on. I have 1500 data items that need > to be copied every minute from an external system into my database. The > items have a timestamp, an identifier and a value. For example: > 12/10/2008 05:00 ID_ABC 14 > 12/10/2008 05:01 ID_ABC 17 > 12/10/2008 05:02 ID_ABC 13 > Pretty simple stuff. The current solution (implemented using SQL Server a > few years ago) looks like this (an approximation using Postgresql syntax): > CREATE TABLE "DataImport" > ( > "DataImportID" serial NOT NULL PRIMARY KEY, > "Time" timestamp without time zone NOT NULL, > "ID_ABC" integer NOT NULL, > "ID_DEF" integer NOT NULL, > "ID_HIJ" integer NOT NULL, > etc > ); So the table would have ~1500 columns? You definitely don't want to do it that way in Postgres --- you'd be way too close to the maximum column count limitation. > My initial thought for the design of the new solution looks like this: > CREATE TABLE "DataImport" > ( > "DataImportID" serial NOT NULL PRIMARY KEY, > "Time" timestamp without time zone NOT NULL, > "Identifier" text NOT NULL, > "Value" integer NOT NULL > ); DataImportID couldn't be a primary key here, could it? Or is it just meant as an artificial primary key? If so, consider not bothering with it at all --- (Time, Identifier) seems like a perfectly good natural key, and with such short records the extra space for a serial column is not negligible. Anyway, my answer would definitely lean towards using this normalized representation, if all the data values are integers. (If they're not, it gets messier...) > Users will then be doing regular queries on this data (say, a few hundred > times per day), such as: > SELECT "Time", "Value" FROM "DataImport" WHERE "Identifier" = 'ID_ABC' AND > "Time" between '2008-11-07' and '2008-11-11'; An index on (Identifier, Time) (not the other way around) should work well for that type of query. > My concern is that 1500 values * 14400 minutes per day = 21,600,000 > records. Add this up over the course of a month (the length of time I need > to keep the data in this table) and I'll have more than half a billion > records being stored in there. That's not an especially big table. However it might be worth your trouble to use partitioning. Not to speed searches, particularly, but to make it easier to drop 1/30th of the data each day. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general