* Oleg Bartunov <oleg@xxxxxxxxxx> [20070420 11:32]: > >If I understand it correctly such a dictionary would require to write > >a custom C component - is that correct? Or could I get away with > >writing a plpgsql function that does the above and hooking that > >somehow into the tsearch2 config? > > You need to write C-function, see example in > http://www.sai.msu.su/~megera/postgres/fts/doc/fts-intdict-xmp.html Thanks. My colleague who speaks more C than me came up with the code below which works fine for us. Will the memory allocated for lexeme be freed by the caller? Til /* * Dictionary for partials of a word, ie. foo => {f,fo,foo} * * Based on the tsearch2/gendict/config.sh generator * * Author: Sean Treadway * * This code is released under the terms of the PostgreSQL License. */ #include "postgres.h" #include "dict.h" #include "common.h" #include "subinclude.h" #include "ts_locale.h" #define is_utf8_continuation(c) ((unsigned char)(c) >= 0x80 && (unsigned char)(c) <= 0xBF) PG_FUNCTION_INFO_V1(dlexize_partial); Datum dlexize_partial(PG_FUNCTION_ARGS); Datum dlexize_partial(PG_FUNCTION_ARGS) { char* in = (char*)PG_GETARG_POINTER(1); char* utxt = pnstrdup(in, PG_GETARG_INT32(2)); /* palloc */ char* txt = lowerstr(utxt); /* palloc */ int txt_len = strlen(txt); int results = 0; int i = 0; /* may overallocate, that's ok */ TSLexeme *res = palloc(sizeof(TSLexeme)*(txt_len+1)); for (i = 1; i <= txt_len; i++) { /* skip UTF8 control codes until EOS */ if (!is_utf8_continuation(txt[i])) { res[results++].lexeme = pnstrdup(txt, i); } } res[results].lexeme=NULL; pfree(utxt); pfree(txt); /* Receiver must free res memory and res[].lexeme */ PG_RETURN_POINTER(res); }