Search Postgresql Archives

Re: DB encoding, locale and indexes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sterfield <sterfield@xxxxxxxxx> writes:
> I'm a sysadmin working for an application that stores all its data in a PG
> database.
> Currently, the cluster has its encoding set to UTF-8, and the locale (both
> LC_COLLATE and LC_CTYPE) is set to 'en_US.UTF-8'.

> I discovered recently that the indexes created on varchar fields are not
> working for LIKE operator, as they are created without the correct class
> operator (as stated in
> http://www.postgresql.org/docs/9.2/static/indexes-types.html).

Right, because en_US.UTF-8 uses dictionary sort order rather than plain
byte-by-byte sort.

> The most straight-forward solution would be to create a second index on the
> same field but with the class operator, in order to have indexes both for
> =, >, < operators AND LIKE / regexp operators. Few additional indexes, some
> diskspace eaten, problem solved.

Yup.

> However, some people are saying that nothing has to change on the index,
> and that the only thing we have to do is to change the LC_COLLATE of each
> databases to 'C', in order for the indexes to work without the class
> operator.

Yes, that is another possible solution, and it's documented.  Keep in mind
though that you can *not* just reach into pg_database and tweak those
fields; if you did, all your indexes would be corrupt, because they'd no
longer match the sort order the system is expecting.  The only safe way to
get there would be to dump and reload into a new database set up this way.
(If you wanted to live dangerously, I guess you could manually tweak the
pg_database fields and then REINDEX every affected index ... but this
still involves substantial downtime, and I would not recommend doing it
without practicing on a test installation.)

You also have to ask whether any of your applications are expecting ORDER
BY some-text-field to produce dictionary order rather than ASCII order.

>    - If I have unicode character stored in my database (for example 'é'),
>    and the LC_COLLATE set to 'C', how the index will behave if I do a query
>    with LIKE '<something>é%' ?

It's still the same character, but it will sort in a possibly unexpected
way.

			regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux