Thanks you for the explanation, Unicode always blows my mind 😊 The problems is that postgres is counting code points which in your example is two.
From: 荒井元成 <n2029@xxxxxxxxxxxxx>
thank you for your reply.
This is because two characters display one character. This includes Unicode Variant Selectors and Combining Characters.
Moto.
From: Graham Myers <gmyers@xxxxxxxxxxxxxxxxx>
Why do you expect the concatenation of two characters to return a length of one?
From: 荒井元成 <n2029@xxxxxxxxxxxxx>
thank you for your reply. It will be 2 characters.
select char_length(U&'\+008FBA' || U&'\+0E0102'); char_length ------------- 2 (1 行)
select length('辺󠄂'); length -------- 2 (1 行)
select char_length('辺󠄂'); char_length ------------- 2 (1 行)
$ psql -l データベース一覧 名前 | 所有者 | エンコーディング | 照合順序 | Ctype(変換演算子) | アクセス権限 -----------+---------+------------------+----------+-------------------+--------------------- D209007 | D209007 | UTF8 | C | C | postgres | D209007 | UTF8 | C | C | template0 | D209007 | UTF8 | C | C | =c/D209007 + | | | | | D209007=CTc/D209007 template1 | D209007 | UTF8 | C | C | =c/D209007 + | | | | | D209007=CTc/D209007 (4 行)
$ cat pgdata/PG_VERSION 13
Moto.
From: David G. Johnston <david.g.johnston@xxxxxxxxx>
Try char_length(text) instead.
David J.
|