Search Postgresql Archives

Re: advice on schema for multilingual text

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Apr 9, 2006, at 0:31 , Daniel McBrearty wrote:

Hi

I have a website that has multilingual text stored in the database. Currently I just have a flat table (lets called it "translations"), one row per text item, one column per language. This works OK, for now, but I am looking at a redesign. Mostly I want to keep information about the languages in teh db as well, so that look like an extra table, one row per lang.

The problem that now arises is that there is an expected correlation between the "languages" and "translations" tables - there should be a row in languages for each column of translations. AFAIK (could well be wrong, I am no expert in db theory) there is no real way to express in the ddl. Not ideal.

An alternative layout would now be to lose the "translations" table, and have two tables in place; one called "base_text" containing the text to be translated, and another called, say, "tx_text" which contains the translations. Each row of "tx_text" references both "base_text" and also "languages".

This looks like a nice layout, as there is an abstract rep of the languages, and we lose the "translations" table which can get very wide. It's nice that the schema doesn't actually change to add a new language.

BUT there are certain invariants that need to be enforced. The main one is this:

There must only be one row in "site_text" for any given language referencing a given row of "base_text".

You'd want a unique index on (base_text, language), like:

create table tx_text
(
	base_text text not null
		references base_text(base_text)
	, language text not null
		references languages (language)
	, primary key (base_text, language)
	, tx_text text not null
);

I would also like to have a column in "languages", type boolean, called "is_base" - this says what the base language is. Here, only ONE row can have a true value. (Obviously it has default value of false and is not null).

Here you want a partial unique index on languages where is_base is true
create table languages
(
	language text primary key
	, is_base boolean not null
);

create unique index languages_only_one_true_base_idx
on languages (is_base)
where is_base;


Another invariant now comes in - the language referenced by every row of "site_text" MUST have "is_base" set to FALSE.

I can think of two ways to do this: one is to write a trigger to enforce this, something like:

create function non_base_language_translations_check
returns trigger()
language plpgsql as $$
begin
	if exists (
		select *
		from tx_text
		natural join languages
		where not is_base
		)
	then
		raise exception
		'Language of translation text must not be a base language.';
	end if;
end;
$$;

Then use this function on triggers that fire on insert and update on tx_text and on update on languages.

Another is to include the is_base column in tx_text (with both language and is_base referencing languages) and use a check constraint to make sure is_base is false:

create table tx_text
(
	base_text text not null
		references base_text(base_text)
	, language text not null
	, is_base boolean not null check not is_base
, foreign key (language, is_base) references languages (language, is_base)
		on update cascade
	, primary key (base_text, language)
	, tx_text text not null
);

The latter method is denormalized, which is not something I usually recommend. I don't know how the constraint checking overhead of using a trigger compares with using the foreign key and check constraint.

A third idea would be to have two languages tables: one with a single row for the base language and another for the target languages (You might even do this using table inheritance, though I haven't thought this completely through). tx_text would reference the target_languages table (or child table, as the case may be).

I'm sure others have opinions on this as well.

Hope this helps.

Michael Glaesemann
grzm myrealbox com





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux