inserts bypass encoding conversion

"James Pang (chaolpan)" <chaolpan@xxxxxxxxx> · Wed, 16 Aug 2023 07:06:51 +0000

Hi, 
   With client_encoding=UTF8 and server_encoding=LATIN1, looks like insert into value with chr(codepoint) bypass encoding conversion , is it expected ?  test as below ,

jamet=# delete from testutf8;
DELETE 1
jamet=# show client_encoding;
client_encoding
-----------------
UTF8
(1 row)

jamet=# show server_encoding;
server_encoding
-----------------
LATIN1
(1 row)

jamet=# \d testutf8
                     Table "public.testutf8"
Column |          Type          | Collation | Nullable | Default
--------+------------------------+-----------+----------+---------
test   | character varying(128) |           |          |

jamet=# insert into testutf8 values('…');
ERROR:  character with byte sequence 0xe2 0x80 0xa6 in encoding "UTF8" has no equivalent in encoding "LATIN1"           <<< here it’s expected to see encoding conversion error

jamet=# insert into testutf8 values(chr(226)||chr(128)||chr(166));                                                                                                 <<< here, looks like using chr(codepoint) works, it bypass encoding_conversion
 ? 
INSERT 0 1
jamet=# set client_encoding='LATIN1';
SET
jamet=# show client_encoding;
client_encoding
-----------------
LATIN1
(1 row)

jamet=# show server_encoding;
server_encoding
-----------------
LATIN1
(1 row)

jamet=# select * from testutf8;
test
------
…
(1 row)

jamet=# insert into testutf8 values('…');                                                                               <<< here, with client and server same LATIN1, no any encoding conversion , and the data got inserted.

INSERT 0 1
jamet=# select * from testutf8;
test
------
…
…
(2 rows)

jamet=# select encode(test::bytea,'hex') from testutf8;                                                    <<< both show same value

 encode
--------
e280a6
e280a6
(2 rows)