Hi Oleg > btw, take a look on contrib/dict_xsyn, it's more powerful than > synonym dictionary. Sorry for the late reply...and thank you for the tip. I will check out xsyn soon. I am about to finish the third and final chapter of my full text series, but I could maybe write an "appendix" chapter which mentions xsyn...or just update my posts. Cheers, Tim > On Sat, May 3, 2014 at 2:26 AM, Tim van der Linden <tim@xxxxxxxxx> wrote: > > Hi Oleg > > > > Haha, understood! > > > > Thanks for helping me on this one. > > > > Cheers > > Tim > > > > > > On May 3, 2014 7:24:08 AM GMT+09:00, Oleg Bartunov <obartunov@xxxxxxxxx> > > wrote: > >> > >> Tim, > >> > >> you did answer yourself - don't use ispell :) > >> > >> On Sat, May 3, 2014 at 1:45 AM, Tim van der Linden <tim@xxxxxxxxx> wrote: > >>> > >>> On Fri, 2 May 2014 21:12:56 +0400 > >>> Oleg Bartunov <obartunov@xxxxxxxxx> wrote: > >>> > >>> Hi Oleg > >>> > >>> Thanks for the response! > >>> > >>>> Yes, it's normal for ispell dictionary, think about morphological > >>>> dictionary. > >>> > >>> > >>> Hmm, I see, that makes sense. I thought the morphological aspect of the > >>> Ispell only dealt with splitting up compound words, but it also deals with > >>> deriving the word to a more "stem" like form, correct? > >>> > >>> As a last question on this, is there a way to disable this dictionary to > >>> emit multiple lexemes? > >>> > >>> > >>> The reason I am asking is because in my (fairly new) understanding of > >>> PostgreSQL's full text it is always best to have as few lexemes as possible > >>> saved in the vector. This to get smaller indexes and faster matching > >>> afterwards. Also, if you run a tsquery afterwards to, you can still employ > >>> the power of these multiple lexemes to find a match. > >>> > >>> Or...probably answering my own question...if I do not desire this > >>> behavior I should maybe not use Ispell and simply use another dictionary :) > >>> > >>> Thanks again. > >>> > >>> Cheers, > >>> Tim > >>> > >>>> On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden <tim@xxxxxxxxx> > >>>> wrote: > >>>>> > >>>>> Good morning/afternoon all > >>>>> > >>>>> I am currently writing a few articles about PostgreSQL's full text > >>>>> capabilities and have a question about the Ispell dictionary which I > >>>>> cannot seem to find an answer to. It is probably a very simple issue, so > >>>>> forgive my ignorance. > >>>>> > >>>>> In one article I am explaining about dictionaries and I have setup a > >>>>> sample configuration which maps most token categories to only use a Ispell > >>>>> dictionary (timusan_ispell) which has a default configuration: > >>>>> > >>>>> CREATE TEXT SEARCH DICTIONARY timusan_ispell ( > >>>>> TEMPLATE = ispell, > >>>>> DictFile = en_us, > >>>>> AffFile = en_us, > >>>>> StopWords = english > >>>>> ); > >>>>> > >>>>> When I run a simple query like "SELECT > >>>>> to_tsvector('timusan-ispell','smiling')" I get back the following tsvector: > >>>>> > >>>>> 'smile':1 'smiling':1 > >>>>> > >>>>> As you can see I get two lexemes with the same pointer. > >>>>> The question here is: why does this happen? > >>>>> > >>>>> Is it normal behavior for the Ispell dictionary to emit multiple > >>>>> lexemes for a single token? And if so, is this efficient? I > >>>>> mean, why could it not simply save one lexeme 'smile' which (same as > >>>>> the snowball dictionary) would match 'smiling' as well if later matched with > >>>>> the accompanying tsquery? > >>>>> > >>>>> Thanks! > >>>>> > >>>>> Cheers, > >>>>> Tim > >>>>> > >>>>> > >>>>> -- > >>>>> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) > >>>>> To make changes to your subscription: > >>>>> http://www.postgresql.org/mailpref/pgsql-general > >>> > >>> > >>> > >>> -- > >>> Tim van der Linden <tim@xxxxxxxxx> -- Tim van der Linden <tim@xxxxxxxxx>