Naz, in posted link to the dict_regex dictionary for tsearch2
http://lynx.sao.ru/~karpov/software/postgres_dict_regex.html
Feel free to test it and send us feedback. It's rather general, of course,
it uses regex (pcre library).
Oleg
On Thu, 26 Jul 2007, Naz Gassiep wrote:
I think you might need to write a custom lexer to divide the strings
into meaningful units. If there are subsections of these names that
make sense to search for, then tsearch2 can certainly handle the
mechanics of that, but I doubt that the standard rules will divide
these names into lexemes usefully.
A custom lexer for tsearch2 that recognized chemistry related lexical
components (di-, tetra-, acetyl-, ethan-, -oic, -ane, -ene etc) would
increase *hugely* the out-of-the-box applicability of PostgreSQL to
scientific applications. Perhaps such an effort could be co ordinated with a
physics based lexer and biology related lexer, to perhaps provide a unified
lexer that provided full scientific capabilities in the way that PostGIS
provides unified geospatial capabilities.
I don't know how best to bring such an effort about, but I do know that if
such a thing were created it would be a boon for PostgreSQL, giving it a very
significant leg up in terms of functionality, not to mention the great
positive impact that the wide, free availability of such a tool would have on
the scientific research community.
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq