Re: Japanese words not distinguished

Harry Mantheakis <harry@xxxxxxxxxxxxxxxxxxxxxxxxxx> · Tue, 12 Jul 2005 19:39:17 +0100

> Hmm, is that actually the correct spelling of the locale?  On my Linux
> box, locale -a says it's "en_GB.utf8".  I'm not sure how well initdb can
> verify the validity of a locale parameter, especially back in the 7.4
> branch.  It could be that you are actually using a locale that doesn't
> use UTF8 encoding, in which case this behavior is not unheard of
> (still pretty broken, IMHO, but I've seen plenty of locale definitions
> that just fail on data outside their supported character set).

Calling "locale -a" on my Linux server also lists "en_GB.utf8".

It also lists "en_US.utf8" and yet all the related environment variables
(LC_COLLATE, etc.) indicate their locale settings is "en_US.UTF-8".

I do not know what to make of that.

> If you did correctly specify a UTF8-using locale, you probably ought to
> report this behavior to your Linux supplier as a bug in that locale
> definition.  It doesn't have to sort or case-fold random UTF8 data very
> nicely, but it certainly shouldn't report distinct strings as equal.

I'll look into that - I'm running Fedora Core 3.

Meanwhile, am I correct in assuming that re-initialising my database cluster
with "--locale=C" will solve the problem?

What is more, am I correct in assuming that I can then restore my data with
pg_restore, as prescribed in the documentation?

Kind regards

Harry Mantheakis
London, UK

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq