On Fri, Mar 6, 2015 at 3:55 AM, lsliang <lsliang@xxxxxxxxxxxxxxx> wrote:
2015-03-06
发件人:Adrian Klaver发送时间:2015-03-05 21:31:39收件人:lsliang; pgsql-general抄送:主题:Re: can postgresql supported utf8mb4 character sets?On 03/05/2015 01:45 AM, lsliang wrote:> can postgresql supported utf8mb4 character set?> today mobile apps support 4-byte character and utf8 can only> support 1-3 bytes characterThe docs would seem to indicate otherwise:> if load string to database which contain a 4-byte character> will failed .Have you actually tried to load strings in to Postgres?If so and it failed what was the method you used and what was the error?> mysql since 5.5.3 support utf8mb4 character sets> I don't find some information about postgresql> thanks--Adrian Klaver>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>thanks for your help .postgresql can support 4-byte charactertest=> select * from utf8mb4_test ;ERROR: character with byte sequence 0xf0 0x9f 0x98 0x84 in encoding "UTF8" has no equivalent in encoding "GB18030"test=> \encoding utf8test=> select * from utf8mb4_test ;content---------😄😄pcauto=>
UTF-8 support works fine. The 3 byte limit was something mysql invented. But it only works if your client encoding is UTF-8. In your example, your terminal is not set to UTF-8.
create table test (glyph text);
insert into test values ('A'), ('馬'), ('𐁀'), ('😄'), ('🇪🇸');
select glyph, convert_to(glyph, 'utf-8'), length(glyph) FROM test;
glyph | convert_to | length
-------+--------------------+--------
A | \x41 | 1
馬 | \xe9a6ac | 1
𐁀 | \xf0908180 | 1
😄 | \xf09f9884 | 1
🇪🇸 | \xf09f87aaf09f87b8 | 2
(5 rows)
What doesn't work is GB18030:
select glyph, convert_to(glyph, 'GB18030'), length(glyph) FROM test;
ERROR: character with byte sequence 0xf0 0x90 0x81 0x80 in encoding "UTF8" has no equivalent in encoding "GB18030"
create table test (glyph text);
insert into test values ('A'), ('馬'), ('𐁀'), ('😄'), ('🇪🇸');
select glyph, convert_to(glyph, 'utf-8'), length(glyph) FROM test;
glyph | convert_to | length
-------+--------------------+--------
A | \x41 | 1
馬 | \xe9a6ac | 1
𐁀 | \xf0908180 | 1
😄 | \xf09f9884 | 1
🇪🇸 | \xf09f87aaf09f87b8 | 2
(5 rows)
What doesn't work is GB18030:
select glyph, convert_to(glyph, 'GB18030'), length(glyph) FROM test;
ERROR: character with byte sequence 0xf0 0x90 0x81 0x80 in encoding "UTF8" has no equivalent in encoding "GB18030"
I think that is a bug.
Gr. Arjen