On Apr 9, 2006, at 0:31 , Daniel McBrearty wrote:
Hi
I have a website that has multilingual text stored in the database.
Currently I just have a flat table (lets called it "translations"),
one row per text item, one column per language. This works OK, for
now, but I am looking at a redesign. Mostly I want to keep
information about the languages in teh db as well, so that look
like an extra table, one row per lang.
The problem that now arises is that there is an expected
correlation between the "languages" and "translations" tables -
there should be a row in languages for each column of translations.
AFAIK (could well be wrong, I am no expert in db theory) there is
no real way to express in the ddl. Not ideal.
An alternative layout would now be to lose the "translations"
table, and have two tables in place; one called "base_text"
containing the text to be translated, and another called, say,
"tx_text" which contains the translations. Each row of "tx_text"
references both "base_text" and also "languages".
This looks like a nice layout, as there is an abstract rep of the
languages, and we lose the "translations" table which can get very
wide. It's nice that the schema doesn't actually change to add a
new language.
BUT there are certain invariants that need to be enforced. The main
one is this:
There must only be one row in "site_text" for any given language
referencing a given row of "base_text".
You'd want a unique index on (base_text, language), like:
create table tx_text
(
base_text text not null
references base_text(base_text)
, language text not null
references languages (language)
, primary key (base_text, language)
, tx_text text not null
);
I would also like to have a column in "languages", type boolean,
called "is_base" - this says what the base language is. Here, only
ONE row can have a true value. (Obviously it has default value of
false and is not null).
Here you want a partial unique index on languages where is_base is true
create table languages
(
language text primary key
, is_base boolean not null
);
create unique index languages_only_one_true_base_idx
on languages (is_base)
where is_base;
Another invariant now comes in - the language referenced by every
row of "site_text" MUST have "is_base" set to FALSE.
I can think of two ways to do this: one is to write a trigger to
enforce this, something like:
create function non_base_language_translations_check
returns trigger()
language plpgsql as $$
begin
if exists (
select *
from tx_text
natural join languages
where not is_base
)
then
raise exception
'Language of translation text must not be a base language.';
end if;
end;
$$;
Then use this function on triggers that fire on insert and update on
tx_text and on update on languages.
Another is to include the is_base column in tx_text (with both
language and is_base referencing languages) and use a check
constraint to make sure is_base is false:
create table tx_text
(
base_text text not null
references base_text(base_text)
, language text not null
, is_base boolean not null check not is_base
, foreign key (language, is_base) references languages (language,
is_base)
on update cascade
, primary key (base_text, language)
, tx_text text not null
);
The latter method is denormalized, which is not something I usually
recommend. I don't know how the constraint checking overhead of using
a trigger compares with using the foreign key and check constraint.
A third idea would be to have two languages tables: one with a single
row for the base language and another for the target languages (You
might even do this using table inheritance, though I haven't thought
this completely through). tx_text would reference the
target_languages table (or child table, as the case may be).
I'm sure others have opinions on this as well.
Hope this helps.
Michael Glaesemann
grzm myrealbox com