May be totally a bad idea :
Rémi-C
explode your sentence into(sentence_number, one_word), n times , (makes a big table, you may want to partition)
then, classic index on sentence number, and on the one world (btree if you make = comparison , more subtel if you do "like 'word' ")
depending on perf, it could be wort it to regroup by words :
sentence_number[], on_word
Then you could try array or hstore on sentence_number[] ?
Cheers,
Rémi-C
2013/12/5 Janek Sendrowski <janek12@xxxxxx>
Hi,
I have tables with millions of sentences. Each row contains a sentence. It is natural language and every language is possible, but the sentences of one table have the same language.
I have to do a similarity search on them. It has to be very fast, because I have to search for a few hundert sentences many times.
The search shouldn't be context-based. It should just get sentences with similar words(maybe stemmed).
I already had a try with gist/gin-index-based trigramm search (pg_trgm extension), fulltextsearch (tsearch2 extension) and a pivot-based indexing (Fixed Query Array), but it's all to slow or not suitable.
Soundex and Metaphone aren't suitable, as well.
I'm already working on this project since a long time, but without any success.
Do any of you have an idea?
I would be very thankful for help.
Janek Sendrowski
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general