Search Postgresql Archives

Re: which charset use for cyrilic?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am Samstag, den 29.10.2005, 13:11 +0400 schrieb Zet:
> Hi
> 
> Which charset is need to be set in database for cyrilic?
> 
> I've used till now WIN, but today I found a problem

win?
> 
> for example:
> 
> SELECT *
> FROM table
> WHERE a = 'слово'
> 
> returns me a record, where a = 'фраза'
> 
> after I tried UNICODE
> but for most of cyrilic words PG gives error like
> "invalid byte sequence for encoding "UNICODE":..."

Well for cyrillic, you have the following options:

cp-1251 (windows codepage)
koi-8 (traditional charset)
utf-8 (universal, if you want to have latin characters coexist 
       with cyrillic. This is also what you get with the
       UNICODE setting in PG)

You should use the same encoding in the database as
you use in your application to make things easier.

Now you have some data already in your database.
So if you want to change the encoding, you need
to recode your char, varchar and text.

1. ) find out the setting of your database:
     show server_encoding()

if this matches, what you want, you are ready with 
this step.

if you get something like SQL_ASCII, then you dont
know what charset actually got used - inspect your
application in this case which encoding it used 
to store text.

Make a complete backup, check your 
lc_* variables:

SHOW LC_MESSAGES; (and so on)

If its not something like 

ru_RU@utf8 (if its UNICODE you want to use)

Then you better run initdb again with the 
correct locales setting. This is important
for lower(),upper(), ilike, oder by, etc. 
to work.

recreate your DB with setting UNICODE (or
whatever you want to use - same as with the
locales)

create a text dump out of your dump via 
pg_restore (its recommended to backup using pg_dump -Fc)
relace the occurences of 

SET CLIENT_ENCODING TO '...'; (this is what your
original database had) With what you now want
as encoding:

SET CLIENT_ENCODING TO 'UNICODE';

(this can be done with sed if you dont want
to load all the dump in your editor)

restore the database with the new cript.
Postgres will take care of the charset conversion.





---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux