Search Postgresql Archives

Re: Normalized Tables & SELECT [was: Find "smallest common year"]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Stefan Schwarzer skrev:
>>> What would you recommend for say, 500 global national statistical
>>> variables,
>>> 500 regional and 500 subregional and 500 global aggregations? Years
>>> being
>>> covered having something between 10 and 60 years for each of these
>>> variables. All available for 240 countries/territories.
>>
>> I generally approach such problems by putting the data right
>> (normalized) at the start, then munging the data into summary tables
>> to handle the problems you're seeing now.
>>
>> I find it far easier to maintain normalized tables that produced
>> non-normalized ones (for things like data warehousing) than it is to
>> maintain non-normalized tables and trying to produce normalized data
>> from that.
> 
> Ok, I do understand that.
> 
> So, instead of the earlier mentioned database design, I would have
> something like this:
> 
>    - one table for the country names/ids/etc. (Afghanistan, 1; Albania,
> 2....)
>    - one table for the variable names/ids/etc. (GDP, 1; Population, 2;
> Fish Catch, 3;....)
>    - one table for the years names/ids/etc. (1970, 1; 1971, 2; 1973, 3;
> ....)
> and
>    - one table for all "statistical data" with four fields -
> id_variable, id_country, id_year, and the actual value

This is one posibility. Another is to have one table for each variable.
This has the benefit of not mixing different units/data types in the
same field. It does mean you cannot use the same (parameterized) query
for getting different measures.

Since it is easy to create views converting from one to the other of
these presentations, which one you choose is not that important

Also, there is no obvious need to have a lookup table for years - just
store the year as an integer in your data table(s). If necessary, add a
constraint indicating which years are valid. You can produce rows from
missing years by left joining with generate_series(start_year, end_year)

Even if you choose to store the valid years in a table, the id_year is
unnecessary - just use the year itself as the primary key.

More in another reply.

Nis


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux