Search Postgresql Archives

Re: utf8 vs UTF-8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/18/24 07:48, Troels Arvin wrote:
Hello,

Tom Lane wrote:
 >>  test1  | loc_test | UTF8   | libc     | en_US.UTF-8 | en_US.UTF-8
 >>  test3  | troels   | UTF8   | libc     | en_US.utf8  | en_US.utf8
 >
 > On most if not all platforms, both those spellings of the locale names
 > will be taken as valid.  You might try running "locale -a" to get an
 > idea of which one is preferred according to your current libc
 > installation

"locale -a" on the Ubuntu system outputs this:

   C
   C.utf8
   en_US.utf8
   POSIX

If you expand that to locale -v -a you get:

locale: en_US.utf8      archive: /usr/lib/locale/locale-archive
-------------------------------------------------------------------------------
    title | English locale for the USA
   source | Free Software Foundation, Inc.
  address | https://www.gnu.org/software/libc/
    email | bug-glibc-locales@xxxxxxx
 language | American English
territory | United States
 revision | 1.0
     date | 2000-06-24
  codeset | UTF-8



So at first, I thought en_US.utf8 would be the most correct locale identifier. However, when I look at Postgres' own databases, they have the slightly different locale string:

   psql --list | grep -E 'postgres|template'
   postgres  | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...
   template0 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...
   template1 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...

Also, when I try to create a database with "en_US.utf8" as locale without specifying a template:

troels=# create database test4 locale 'en_US.utf8';
ERROR:  new collation (en_US.utf8) is incompatible with the collation of the template database (en_US.UTF-8) HINT:  Use the same collation as in the template database, or use template0 as template.

I'm going to say that is Postgres being exact to a fault.


Given the locale of Postgres' own databases and Postgres' error message, I'm leaning to en_US.UTF-8 being the most correct locale to use. Because why would Postgres care about it, if utf8/UTF-8 doesn't matter?


but TBH, I doubt it's worth worrying about.

But couldn't there be an issue, if for example the client's locale and the server's locale aren't exactly the same? I'm thinking maybe the client library has to perform unneeded translation of the stream of data to/from the database?



--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx






[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux