tsearch2 on-demand dictionary loading & using functions in tsearch2

iSteve <isteve@xxxxxxx> · Sat, 17 May 2008 15:43:27 +0200

Hello,

I'd like to ask about two separate things regarding tsearch2 in 
PostgreSQL 8.3.

Firstly, I've noticed that dictionary is loaded on-demand specifically 
for each session, and apparently this behavior cannot be changed in any way.

If that's the case, would it be reasonable to ask for an option to allow 
loading during Postgres startup, rather than during the first usage of 
the dictionary in each distinctive session?

I am currently working with ispell dictionaries for multiple languages, 
each being approx. 3MB large. With a lookup within a single dictionary, 
the first ts_lexize takes over one second, which from user's point of 
view is quite a long time.

I see several benefits of the suggested approach:
 * For those who do not use persistent connections of any sort, using 
ispell dictionaries right now inflicts a severe blow in application 
responsiveness. Loading the dictionaries during database startup instead 
would speed things up significantly.
 * Considering the database is loaded separately for each session, does 
this also imply that each running backend has a separate dictionary 
stored in memory? If that is the case, using eg. 2 dictionaries, each 
3MB large, on a database server with 20 backends running would eat up as 
much as 120MB of RAM, while if the server loaded the dictionaries 
beforehand, the OS could (possibly) keep the dictionaries shared in memory.

As for downsides, I only really see two:
 * Tracking updates of dictionaries - but it's reasonable to believe 
that new connections get open more often than the dictionary gets 
updated. Also, this might be easily solved by stat()-ing the dictionary 
file before starting up session, and only have the server reload it if 
there's a notified change.
 * Possibly complicated to implement?

As for my second question, is it possible to use functions in tsearch2? 
For example, writing my own stemmer in PL/pgSQL or in C as a postgres 
function.

Thanks in advance for reply,
Steve