RE: Re: About Unicode IVS

荒井元成 <n2029@xxxxxxxxxxxxx> · Wed, 30 Mar 2022 09:06:06 +0900

Variant forms cannot be solved by normalization.

Moto.

-------- 元のメッセージ --------
件名: Re: About Unicode IVS
日付: 2022-03-29 19:02
発信者: Holger Jakobs <holger@xxxxxxxxxx>
宛先: pgsql-admin@xxxxxxxxxxxxxxxxxxxx

It's totally correct that the two characters are still two characters.

You would have to normalize the string first, so that the combination becomes one character.

More information about this topic, which is in part beyond PostgreSQL:

  	*
https://stackoverflow.com/questions/7931204/what-is-normalized-utf-8-all-about
[1]
  	* https://en.wikipedia.org/wiki/Unicode_equivalence [2]

Regards,

Holger

Am 29.03.22 um 11:55 schrieb 荒井元成:

> thank you for your reply.
> 
> Changing the collation order and CTYPE did not change the behavior.
> 
> 名前 | 所有者 | エンコーディング | 照合順序
> | Ctype(変換演算子) | アクセス権限
> 
> 
-----------+---------+------------------+-------------+-------------------+---------------------
> 
> 
> D209007 | D209007 | UTF8 | C | C
> |
> 
> postgres | D209007 | UTF8 | C | C
> |
> 
> template0 | D209007 | UTF8 | C | C
> | =c/D209007 +
> 
> | | | |
> | D209007=CTc/D209007
> 
> template1 | D209007 | UTF8 | C | C
> | =c/D209007 +
> 
> | | | |
> | D209007=CTc/D209007
> 
> template2 | D209007 | UTF8 | ja_JP.UTF-8 | ja_JP.UTF-8
> |
> 
> (5 行)
> 
> D209007=# c template2
> 
> 
データベース"template2"にユーザ"D209007"として接続しました。
> 
> 
> template2=# select char_length(U&'+0000E6' || U&'+000300');
> 
> char_length
> 
> -------------
> 
> 2
> 
> (1 行)
> 
> template2=# select char_length(U&'+008FBA' || U&'+0E0102');
> 
> char_length
> 
> -------------
> 
> 2
> 
> (1 行)
> 
> template2=# select length(U&'+008FBA' || U&'+0E0102');
> 
> length
> 
> --------
> 
> 2
> 
> (1 行)
> 
> Moto.
> 
> FROM: Michel SALAIS <msalais@xxxxxxx>
> SENT: Tuesday, March 29, 2022 6:35 PM
> TO: '荒井元成' <n2029@xxxxxxxxxxxxx>; 'David G. Johnston'
> <david.g.johnston@xxxxxxxxx>
> CC: pgsql-admin@xxxxxxxxxxxxxxxxxxxx
> SUBJECT: RE: About Unicode IVS
> 
> Hi,
> 
> I think this has something to do with collation and ctype. As I see 
> you have it set to “C” for all your databases (even if I don’t 
> understand your titles 😊).
> 
> _Michel SALAIS_
> 
> _ _
> 
> DE : 荒井元成 <n2029@xxxxxxxxxxxxx>
> ENVOYÉ : mardi 29 mars 2022 06:35
> À : 'David G. Johnston' <david.g.johnston@xxxxxxxxx> CC : 
> pgsql-admin@xxxxxxxxxxxxxxxxxxxx OBJET : RE: About Unicode IVS
> 
> thank you for your reply.
> 
> It will be 2 characters.
> 
> select char_length(U&'+008FBA' || U&'+0E0102');
> 
> char_length
> 
> -------------
> 
> 2
> 
> (1 行)
> 
> select length('辺󠄂');
> 
> length
> 
> --------
> 
> 2
> 
> (1 行)
> 
> select char_length('辺󠄂');
> 
> char_length
> 
> -------------
> 
> 2
> 
> (1 行)
> 
> $ psql -l
> 
> データベース一覧
> 
> 名前 | 所有者 | エンコーディング | 照合順序 |
> Ctype(変換演算子) | アクセス権限
> 
> 
-----------+---------+------------------+----------+-------------------+---------------------
> 
> 
> D209007 | D209007 | UTF8 | C | C
> |
> 
> postgres | D209007 | UTF8 | C | C
> |
> 
> template0 | D209007 | UTF8 | C | C
> | =c/D209007 +
> 
> | | | |
> | D209007=CTc/D209007
> 
> template1 | D209007 | UTF8 | C | C
> | =c/D209007 +
> 
> | | | |
> | D209007=CTc/D209007
> 
> (4 行)
> 
> $ cat pgdata/PG_VERSION
> 
> 13
> 
> Moto.
> 
> FROM: David G. Johnston <david.g.johnston@xxxxxxxxx>
> SENT: Tuesday, March 29, 2022 12:38 PM
> TO: 荒井元成 <n2029@xxxxxxxxxxxxx>
> CC: pgsql-admin@xxxxxxxxxxxxxxxxxxxx
> SUBJECT: Re: About Unicode IVS
> 
> On Monday, March 28, 2022, 荒井元成 <n2029@xxxxxxxxxxxxx> wrote:
> 
>> Hi,
>> 
>> In the Length () function, it will be 2 characters where you want it 
>> to be 1 character.
>> 
>> Is it possible to respond by changing the settings such as changing 
>> the collation setting like SQL Server?
>> 
>> Also, if you understand how to deal with it (eg, create your own 
>> function), it would be helpful if you could provide as much 
>> information as you can.
> 
> Try char_length(text) instead.
> 
> David J.

--
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012

Links:
------
[1] 
https://stackoverflow.com/questions/7931204/what-is-normalized-utf-8-all-about
[2] https://en.wikipedia.org/wiki/Unicode_equivalence