Hi,
May "GB18030 server side support" deserve reconsidering, after about 15 years later than release of GB18030-2005?
It may be the one of most green features for PostgreSQL.
1. In this big data and mobile era, in the country with most population, 50% more disk energy consuming for Chinese characters (UTF-8 usually 3 bytes for a Chinese character, while GB180830 only 2 bytes) is indeed a harm to "Carbon Neutral", along with Polar
ice melting.
2."Setting client side to UTF-8, just like setting server side to UTF-8" in the following mail is not practical for most Chinese IT projects, especially public funding projects. Because GB18030 compatible is a law in Mainland
China.
Usually the client side encoding configuration with a GUI is more difficult to be hidden, and most MS Windows users are familiar with GB18030.
MySQL supports GB18030 in server side from V5.7 in 2015. And I am not sure how much this feature contributed to MySQL's more popular in Mainland China.
Parker Han
From: pgsql-general-owner@xxxxxxxxxxxxxx <pgsql-general-owner@xxxxxxxxxxxxxx> on behalf of Arjen Nienhuis <a.g.nienhuis@xxxxxxxxx>
Sent: Saturday, March 7, 2015 8:18 To: lsliang <lsliang@xxxxxxxxxxxxxxx> Cc: Adrian Klaver <adrian.klaver@xxxxxxxxxxx>; pgsql-general <pgsql-general@xxxxxxxxxxxxxx> Subject: Re: Re: Re: [GENERAL] can postgresql supported utf8mb4 character sets? On Fri, Mar 6, 2015 at 3:55 AM, lsliang
<lsliang@xxxxxxxxxxxxxxx> wrote:
UTF-8 support works fine. The 3 byte limit was something mysql invented. But it only works if your client encoding is UTF-8. In your example, your terminal is not set to UTF-8.
create table test (glyph text); insert into test values ('A'), ('馬'), ('𐁀'), ('😄'), ('🇪🇸'); select glyph, convert_to(glyph, 'utf-8'), length(glyph) FROM test; glyph | convert_to | length -------+--------------------+-------- A | \x41 | 1 馬 | \xe9a6ac | 1 𐁀 | \xf0908180 | 1 😄 | \xf09f9884 | 1 🇪🇸 | \xf09f87aaf09f87b8 | 2 (5 rows) What doesn't work is GB18030: select glyph, convert_to(glyph, 'GB18030'), length(glyph) FROM test; ERROR: character with byte sequence 0xf0 0x90 0x81 0x80 in encoding "UTF8" has no equivalent in encoding "GB18030" I think that is a bug.
Gr. Arjen
|