Hello to all experts, I am considering of using pg_trgm extension in a research publication, since initial results seem promising. The index seems to works pretty fast for finding similar text and significantly accelerate query time. The problem is that I do not know the theory behind it or the exact method it uses. My questions: a) It probably uses the q-grams method (basically 3 grams only). Does it also create 2 grams and 1 grams to determine similarity? b) About the index (either gist on gin). Is it based on RD-tree? If not what is the exact indexing method it uses? c) Will it work for any UTF8 characters / strings because the documentation says for ASCII. d) I also found the http://pgsimilarity.projects.pgfoundry.org/ project who does similarity functions for string. Does pg_trgm extension have anything to do with that? Since pgsimilarity seems abandoned is there another project that a) uses some kind of indexing for similarity b) provides most functions for string similarity like pgsimilarity? Thanks -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-trgm-extension-and-theory-tp5793180.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general