Re: Normalized Tables & SELECT [was: Find "smallest common year"]

Alban Hertroys <a.hertroys@xxxxxxxxxxxxxxxxx> · Fri, 28 Sep 2007 11:04:01 +0200

Stefan Schwarzer wrote:
> Ok, I do understand that.
> 
> So, instead of the earlier mentioned database design, I would have
> something like this:
> 
>    - one table for the country names/ids/etc. (Afghanistan, 1; Albania,
> 2....)
>    - one table for the variable names/ids/etc. (GDP, 1; Population, 2;
> Fish Catch, 3;....)

>    - one table for the years names/ids/etc. (1970, 1; 1971, 2; 1973, 3;
> ....)

If you _do_ need this table (because you want to constrain your
statistical data to only contain a specific set of years, or because you
need a quick list of available years to select from): Make the year
primary key and drop the artificial index.

Years are perfectly fine data to constrain on, and it saves you the
joins with that table (the foreign key constraint constrains your data
sufficiently).

You may want to constrain your years further by adding a check
constraint, fe. CHECK (year > 1900).

The same argument goes for the country names, but I generally don't like
to have text data for primary key.

Contrary to years, their values might change (for which an ON UPDATE
CASCADE FK constraint trigger would have to go through your data to
update all the references). Plus they take up a bit more space than
integers, although with country names that won't matter that much.

OTOH... If your country names contain non-ascii characters and the
database is configured to collate on those properly, it will be slower.

That aside (they're kind of minor points), the country name is also a
good candidate to become key (primary and foreign respectively).

> and
>    - one table for all "statistical data" with four fields -
> id_variable, id_country, id_year, and the actual value

Some things may have changed here due to my above suggestions.

> It seems to me more difficult now to produce a non-normalized output
> based on the normalized table. How would look a query like, if I need
> now to SELECT, say 100 countries and 20 years? Something like this
> (simplified and without joins):
> 
>     SELECT
>             value,
>             id.year
>     FROM
>             main_table
>     WHERE
>             year = '1970' OR
>             year = '1971' OR
>             ....
>             country_name = 'Afghanistan' OR
>              country_name = 'Albania' OR
>             ...

I don't really see what the problem is; with the years as columns in
your table they're just in a different places in this query.

Without knowing what problem you're trying to solve with this query it's
a bit of a guess. I assume those years and countries are user input?

I usually use BETWEEN or IN (...) for such cases, but that boils down to
the same query. It's just shorter to write.

BTW, You didn't actually use type text for your year column, did you? No
quotes needed then. Otherwise you'd have to make sure your year values
are all the same length or sorting gets... interesting.

SELECT
	value,
	year
FROM
	main_table
WHERE
	year BETWEEN 1970 AND 1975
	AND country_name IN ('Afghanistan', 'Albania', ...)

But I think you'll usually be looking for aggregates, so GROUP BY and
HAVING are probably more appropriate for your queries.

SELECT country_name, year, SUM(value)
  FROM main_table
 WHERE country_name IN ('Afghanistan', 'Albania', ...)
   AND year BETWEEN 1970 AND 1975
 GROUP BY country_name, year
 HAVING SUM(value) > 0;

-- 
Alban Hertroys
alban@xxxxxxxxxxxxxxxxx

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster