Search Postgresql Archives

Re: Error while loading sql file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



HI Ardash!

INSERT INTO conceptnet_frame VALUES(3884,'ja','{1}は{2}を持っている。',16,3,2140,NULL,NULL,NULL);
....

Can you still access the database that produced the dump? If so, you may want to produce a number of dumps for distinct language values. Japanese, in particular, is a very complicated issue, as Japanese change alphabet to produce the same effect Latin writers get by using bold and italic. So text searches are quite a nightmare, since DOG is written in different alphabets (Latin included), depending on what the author had in mind, in terms of underlining this specific word/_expression_.

If you could manage to insulate single languages things would definitely get less complex, as you can track down specific problems related to a single language (or maybe to an external source that inserted weird encodings). If the original DB is not available you could still filter the dump file for such values (although this is a very boring experience). 

One of the good things with this approach (in case you need to do heavy text searches on your data) is that (as of PG 9.1) you can model structures like:

CREATE COLLATION "de_DE.utf8" (
    LC_COLLATE = "de_DE.utf8", 
    LC_CTYPE = "de_DE.utf8" 
);
CREATE COLLATION "en_GB.utf8" (
    LC_COLLATE = "en_GB.utf8", 
    LC_CTYPE = "en_GB.utf8" 
);
# NOTE!! all these locales must pre-exist on your box, you are simply importing them into PG with this CREATE phase.

CREATE TABLE conceptnet_frame_root (
  id            BIGINT NOT NULL PRIMARY KEY,
  ISO693_1_code CHAR(2) NOT NULL
);

CREATE TABLE conceptnet_frame_en (
  linguistic_content TEXT COLLATE "en_GB.utf8" NOT NULL 
) INHERITS (conceptnet_frame_root);

CREATE TABLE conceptnet_frame_de (
  linguistic_content TEXT COLLATE "de_DE.utf8" NOT NULL
) INHERITS (conceptnet_frame_root);

Then have a view pick up the entire linguistic bouquet, if needed. Sadly you cannot leave 'linguistic_content' in the root table, because you won't be able to use an alter table on it later, to alter the collation at single inherited table level, so you are responsible of ensuring type consistency on your own. It does add complication to the model, however, if you are into serious conflicts among different languages this is the best solution I could come up with, thus far.

You may want to have a look at:
http://www.postgresql.org/docs/9.1/static/collation.html

BTW, ISO 639-1 is a dangerous standard to use, if you are to make extensive language coverage, see
http://en.wikipedia.org/wiki/ISO_639 

Hope this helps
Bèrto

--
==============================
If Pac-Man had affected us as kids, we'd all be running around in a darkened room munching pills and listening to repetitive music.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux