Hi, On 14.07.2016 01:16, Stefan Keller wrote:
Hi, I have a text corpus which contains either German or English docs and I expect queries where I don't know if it's German or English. So I'd like e.g. that a query "forest" matches "forest" in body_en but also "Wald" in body_de. I created a table with attributes body_en and body_de (type "text"). I will use ts_vector/ts_query on the fly (don't need yet an index (attributes)). * Can FTS handle this multilingual situation?
In my opinion, PostgreSQL cant handle it. It cant translate words from one language to another, it just stems word from original form to basic form. First you need to translate word from English to German, then search word in the body_de attribute.
And the issue is complicated by the fact that one word could have different meaning in the other language.
* How to setup a text search configuration which e.g. stems en and de words? * Should I create a synonym dictionary which contains word translations en-de instead of synonyms en-en?
This synonym dictionary will contain a thousands entries. So it will require a great effort to make this dictionary.
* Any hints to related work where FTS has been used in a multilingual context? :Stefan
-- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general