RE: About Unicode IVS

Graham Myers <gmyers@xxxxxxxxxxxxxxxxx> · Tue, 29 Mar 2022 09:26:09 +0100

Thanks you for the explanation, Unicode always blows my mind 😊  The problems is that postgres is counting code points which in your example is two.

Graham Myers

From: 荒井元成 <n2029@xxxxxxxxxxxxx> 
Sent: 29 March 2022 09:21
To: 'Graham Myers' <gmyers@xxxxxxxxxxxxxxxxx>; 'David G. Johnston' <david.g.johnston@xxxxxxxxx>
Cc: pgsql-admin@xxxxxxxxxxxxxxxxxxxx
Subject: RE: About Unicode IVS

thank you for your reply.

This is because two characters display one character.
This includes Unicode Variant Selectors and Combining Characters.

Moto.

From: Graham Myers <gmyers@xxxxxxxxxxxxxxxxx> 
Sent: Tuesday, March 29, 2022 4:46 PM
To: 荒井元成 <n2029@xxxxxxxxxxxxx>; David G. Johnston <david.g.johnston@xxxxxxxxx>
Cc: pgsql-admin@xxxxxxxxxxxxxxxxxxxx
Subject: RE: About Unicode IVS

Why do you expect the concatenation of two characters to return a length of one?  

Graham Myers

From: 荒井元成 <n2029@xxxxxxxxxxxxx> 
Sent: 29 March 2022 05:35
To: 'David G. Johnston' <david.g.johnston@xxxxxxxxx>
Cc: pgsql-admin@xxxxxxxxxxxxxxxxxxxx
Subject: RE: About Unicode IVS

thank you for your reply.
It will be 2 characters.

select char_length(U&'\+008FBA' || U&'\+0E0102');
char_length
-------------
           2
(1 行)

select length('辺󠄂');
length
--------
      2
(1 行)

select char_length('辺󠄂');
char_length
-------------
           2
(1 行)

$ psql -l
                                      データベース一覧
   名前    | 所有者  | エンコーディング | 照合順序 | Ctype(変換演算子) |    アクセス権限
-----------+---------+------------------+----------+-------------------+---------------------
D209007   | D209007 | UTF8             | C        | C                 |
postgres  | D209007 | UTF8             | C        | C                 |
template0 | D209007 | UTF8             | C        | C                 | =c/D209007         +
           |         |                  |          |                   | D209007=CTc/D209007
template1 | D209007 | UTF8             | C        | C                 | =c/D209007         +
           |         |                  |          |                   | D209007=CTc/D209007
(4 行)

$ cat pgdata/PG_VERSION
13

Moto.

From: David G. Johnston <david.g.johnston@xxxxxxxxx> 
Sent: Tuesday, March 29, 2022 12:38 PM
To: 荒井元成 <n2029@xxxxxxxxxxxxx>
Cc: pgsql-admin@xxxxxxxxxxxxxxxxxxxx
Subject: Re: About Unicode IVS

On Monday, March 28, 2022, 荒井元成 <n2029@xxxxxxxxxxxxx> wrote:
Hi,

In the Length () function, it will be 2 characters where you want it to be 1 character.
Is it possible to respond by changing the settings such as changing the collation setting like SQL Server?
Also, if you understand how to deal with it (eg, create your own function), it would be helpful if you could provide as much information as you can.

Try char_length(text) instead.

David J.