Search Postgresql Archives

Re: Accent insensitive search

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hi. I have a few databases created with UNICODE encoding, and I would like to be able to search with accent insensitivity. There's something in Oracle (NLS_COMP, NLS_SORT) and SQL Server (don't remember) to do this, but I found nothing in PostgreSQL, just the 'to_ascii' function, which AFAIK, doesn't work with UNICODE.

The easiest way is to create an extra column which will hold a copy of your text, with all accents removed. You can also convert it to lowercase and remove apostrophes, punctuation etc. Said column is kept up to date with a trigger.
	Python is suitable for this (use unicodedata.normalize).
Keeping a copy of the processed data will speed up search versus WHERE remove_accents( blah ) = 'text', even with a function index. Note that this function could be written in C and use a table on the first 64K unicode symbols for speedup.

	See attached file.

Attachment: create_ft_functions.sql
Description: Binary data


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux