Re: Text search dictionary vs. the C locale

Gmail <robjsargent@xxxxxxxxx> · Sun, 2 Jul 2017 11:11:12 -0600



Sent from my iPad

> On Jul 2, 2017, at 10:06 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
> 
> twoflower <standa.kurik@xxxxxxxxx> writes:
>> I am having problems creating an Ispell-based text search dictionary for
>> Czech language.
> 
>> Issuing the following command:
> 
>> create text search dictionary czech_ispell (
>>  template = ispell,
>>  dictfile = czech_ispell,
>>  affFile = czech_ispell
>> );
> 
>> ends with
> 
>> ERROR:  syntax error
>> CONTEXT:  line 252 of configuration file
>> "/usr/share/postgresql/9.6/tsearch_data/czech_ispell.affix": " . > TŘIA
> 
>> The dictionary files are in UTF-8. The database cluster was initialized with
>> initdb --locale=C --encoding=UTF8
> 
> Presumably the problem is that the dictionary file parsing functions
> reject anything that doesn't satisfy t_isalpha() (unless it matches
> t_isspace()) and in C locale that's not going to accept very much.
> 
> I wonder why we're doing it like that.  It seems like it'd often be
> useful to load dictionary files that don't match the database's
> prevailing locale.  Do we really need the t_isalpha tests, or would
> it be good enough to assume that anything that isn't t_isspace is
> part of a word?
> 
>            regards, tom lane
> 
What about punctuation?
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general