On Thu, Sep 15, 2016 at 5:46 AM, Jaap Roes <jroes@xxxxxxxxxxx> wrote:
I've got a table that stores some multilingual content:
CREATE TABLE search (
content text NOT NULL,
language regconfig NOT NULL,
fulltext tsvector
);
CREATE INDEX search_fulltext ON search USING GIN(fulltext);
INSERT INTO search (language, content) VALUES
('dutch', 'Als achter vliegen vliegen vliegen vliegen vliegen vliegen achterna'),
('dutch', 'Langs de koele kali liep een kale koeli met een kilo kali op zijn kale koeli-kop.'),
('dutch', 'Moeder sneed zeven scheve sneden brood'),
('english', 'I saw Susie sitting in a shoe shine shop. Where she sits she shines, and where she shines she sits.'),
('english', 'How can a clam cram in a clean cream can?'),
('english', 'Can you can a can as a canner can can a can?');
UPDATE search SET fulltext = to_tsvector(language, content);
To make sure I always search in the correct language I use these queries:
SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'shine');
(1 row)
I don't think that what you want is possible to do in a clean way. It can't know what the language for a row is until it finds the row, but it can't find the row using an index until it runs to_tsquery, and it can't run to_tsquery until it knows the language.
You would probably need to run the query once for each language, and filter out the results which it found under the 'wrong' language.
SELECT * FROM search WHERE fulltext @@ to_tsquery('dutch', 'shine') and language = 'dutch'::regconfig
union all
SELECT * FROM search WHERE fulltext @@ to_tsquery('english', 'shine') and language = 'english'::regconfig;
If you have more than two languages, this will become tedious, and perhaps non-performant. I am not aware of a more general solution, though.
It might help to build partial indexes on each language.
Cheers,
Jeff