Search Postgresql Archives

Re: Fastest Index/Algorithm to find similar sentences

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 2, 2013 at 10:25 AM, Kevin Grittner <kgrittn@xxxxxxxxx> wrote:
> Janek Sendrowski <janek12@xxxxxx> wrote:
>
>> I also tried pg_trgm module, which works with tri-grams, but it's
>> also very slow with 100.000+ rows.
>
> Hmm.  I found the pg_trgm module very fast for name searches with
> millions of rows *as long as I used KNN-GiST techniques*.  Were you
> careful to do so?  Check out the "Index Support" section of this
> page:
>
> http://www.postgresql.org/docs/current/static/pgtrgm.html
>
> While I have not tested this technique with a column containing
> sentences, I would expect it to work well.  As a quick
> confirmation, I imported the text form of War and Peace into a
> table, with one row per *line* (because that was easier than
> parsing sentence boundaries for a quick test).  That was over
> 65,000 rows.

+ 1 this.  pg_trgm is black magic.  search time (when using index) is
mostly dependent on number of trigrams in search string vs average
number of trigrams in database.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux