I broached this topic last year[1], but the project got tabled until now; so I raise it again. We want to be able to search text (extracted from character-based PDF files) which will contain legal terms and statute cites, and we want to be able to do tsearch2 searches (under 8.3.recent). It's clear enough how to create a dictionary to gracefully handle the legal terms, but I'm less sure about the statute cites. I got one response[2], which mentioned a prefix search in the 8.4 release, and provided a link to a perl regular expression based dictionary. I'm wondering if anyone has feedback one either of these techniques, and whether they might work for our needs. I'm not sure I adequately described our needs, so I'll fill that out a little more. People are likely to search for statute cites, which tend to have a hierarchical form. I'm not sure the prefix approach will work for this. For example, there is a section 939.64 in the state statutes dealing with commission of a crime while wearing a bulletproof garment. If someone searches for that, they should find subsections like 939.64(1) or 939.64(2) but not different sections which start with the same characters like 939.641 (the section on concealing identity) or 939.645 (the section on hate crimes). A search for chapter 939 should return any of the above. Of course, we want someone to be able to search on 939.64, 939.641, and 939.645 and get documents which reference all of the above (i.e., to look for a document referring to a hate crime committed while concealing identity and wearing a bulletproof garment). Suggestions welcome on how to handle this user requirement. -Kevin [1] http://archives.postgresql.org/pgsql-admin/2008-06/msg00033.php [2] http://archives.postgresql.org/pgsql-admin/2008-06/msg00034.php -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general