Stefan Schwarzer wrote: > Ok, I do understand that. > > So, instead of the earlier mentioned database design, I would have > something like this: > > - one table for the country names/ids/etc. (Afghanistan, 1; Albania, > 2....) > - one table for the variable names/ids/etc. (GDP, 1; Population, 2; > Fish Catch, 3;....) > - one table for the years names/ids/etc. (1970, 1; 1971, 2; 1973, 3; > ....) If you _do_ need this table (because you want to constrain your statistical data to only contain a specific set of years, or because you need a quick list of available years to select from): Make the year primary key and drop the artificial index. Years are perfectly fine data to constrain on, and it saves you the joins with that table (the foreign key constraint constrains your data sufficiently). You may want to constrain your years further by adding a check constraint, fe. CHECK (year > 1900). The same argument goes for the country names, but I generally don't like to have text data for primary key. Contrary to years, their values might change (for which an ON UPDATE CASCADE FK constraint trigger would have to go through your data to update all the references). Plus they take up a bit more space than integers, although with country names that won't matter that much. OTOH... If your country names contain non-ascii characters and the database is configured to collate on those properly, it will be slower. That aside (they're kind of minor points), the country name is also a good candidate to become key (primary and foreign respectively). > and > - one table for all "statistical data" with four fields - > id_variable, id_country, id_year, and the actual value Some things may have changed here due to my above suggestions. > It seems to me more difficult now to produce a non-normalized output > based on the normalized table. How would look a query like, if I need > now to SELECT, say 100 countries and 20 years? Something like this > (simplified and without joins): > > SELECT > value, > id.year > FROM > main_table > WHERE > year = '1970' OR > year = '1971' OR > .... > country_name = 'Afghanistan' OR > country_name = 'Albania' OR > ... I don't really see what the problem is; with the years as columns in your table they're just in a different places in this query. Without knowing what problem you're trying to solve with this query it's a bit of a guess. I assume those years and countries are user input? I usually use BETWEEN or IN (...) for such cases, but that boils down to the same query. It's just shorter to write. BTW, You didn't actually use type text for your year column, did you? No quotes needed then. Otherwise you'd have to make sure your year values are all the same length or sorting gets... interesting. SELECT value, year FROM main_table WHERE year BETWEEN 1970 AND 1975 AND country_name IN ('Afghanistan', 'Albania', ...) But I think you'll usually be looking for aggregates, so GROUP BY and HAVING are probably more appropriate for your queries. SELECT country_name, year, SUM(value) FROM main_table WHERE country_name IN ('Afghanistan', 'Albania', ...) AND year BETWEEN 1970 AND 1975 GROUP BY country_name, year HAVING SUM(value) > 0; -- Alban Hertroys alban@xxxxxxxxxxxxxxxxx magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World // ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster