>>> Oleg Bartunov <oleg@xxxxxxxxxx> wrote: > On Tue, 10 Mar 2009, Tom Lane wrote: >> "Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> writes: >>> People are likely to search for statute cites, which tend to have a >>> hierarchical form. I'm not sure the prefix approach will work for >>> this. For example, there is a section 939.64 in the state statutes >>> dealing with commission of a crime while wearing a bulletproof >>> garment. If someone searches for that, they should find subsections >>> like 939.64(1) or 939.64(2) but not different sections which start >>> with the same characters like 939.641 (the section on concealing >>> identity) or 939.645 (the section on hate crimes). A search for >>> chapter 939 should return any of the above. >> >> Perhaps you could pass the texts and the queries through a regexp >> substitution that converts digit-dot-digit to digit-dash-digit? > > perhaps, for 8.4 it's better to utilize prefix search, like > to_tsquery('939.645:*') will find what Kevin need. The problem is with > parser, so I'd preprocess text before indexing to convert all > digit.digit(digit) to digit.digit.digit, which is what parser recognizes as > a single lexem 'version'. Here is just an illustration > > qq=# select * from ts_parse('default',translate('939.64(1)','()','. ')); > tokid | token > -------+---------- > 8 | 939.64.1 > 12 | > > btw, having 'version' it's possible to use dict_regex for 8.3. Tom, Oleg: Thanks for the suggestions. Looks promising. -Kevin -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general