回复: 回复: May "PostgreSQL server side GB18030 character set support" reconsidered?

Han Parker <parker.han@xxxxxxxxxxx> · Tue, 6 Oct 2020 03:13:06 +0000

发件人: Tatsuo Ishii <ishii@xxxxxxxxxxxx>

发送时间: 2020年10月6日 2:15

收件人: tgl@xxxxxxxxxxxxx <tgl@xxxxxxxxxxxxx>

抄送: parker.han@xxxxxxxxxxx <parker.han@xxxxxxxxxxx>; pgsql-general@xxxxxxxxxxxxxx <pgsql-general@xxxxxxxxxxxxxx>

主题: Re: 回复: May "PostgreSQL server side GB18030 character set support" reconsidered?

> Hmm ... interesting idea, basically invent our own modified version

> of GB18030 (or SJIS?) for backend-internal storage.  But I'm not

> sure how to make it work without enlarging the string, which'd defeat

> the OP's argument.  It looks to me like the second-byte code space is

> already pretty full in both encodings.

>But as he already admitted, actually GB18030 is 4 byte encoding, rather
>than 2 bytes. So maybe we could find a way to map original GB18030 to
>ASCII-safe GB18030 using 4 bytes.>

>As for SJIS, no big demand for the encoding in Japan these days. So I
>think we can leave it as it is.>

>Best regards,
>--
>Tatsuo Ishii
>SRA OSS, Inc. Japan
>English: http://www.sraoss.co.jp/index_en.php
>Japanese:http://www.sraoss.co.jp

So the key lies in a ASCII-safe GB18030 simple mapping algorithm (Maybe named with abbreviation "GB18030as" of GB18030_ascii_safe?), which not break "ASCII-safe" while save lots of storage (The ANSI-safe
 GB2312 contains most frequently used 6763 characters).
In fact, it was GBK designed by Microsoft broke "ASCII-safe" in about 1995 with the popular
 of Win95. Later GB18030 inherited it because it had to compatible with GBK.

Thanks.
I will try to find whether any opinions regarding "a
 ASCII-safe GB18030 simple mapping algorithm" exist in GB18030 standard maintainers community.