Re: inserts bypass encoding conversion

Tom Lane <tgl@xxxxxxxxxxxxx> · Wed, 16 Aug 2023 22:40:32 -0400

"James Pang (chaolpan)" <chaolpan@xxxxxxxxx> writes:
> So,  insert into values(chr(226)||chr(128)||chr(166)) actually got stored in database with LATIN1 with single byte sequence, but when query select * from testutf8, it got converted to UTF8 three byte sequence first ? 

There are no LATIN1 characters that have longer than 2-byte UTF8
representations, so no.

I think your fundamental misunderstanding is supposing that this:

	chr(226)||chr(128)||chr(166)

produces something equivalent to the UTF8 sequence 0xe2 0x80 0xa6.
It will not, no matter which server encoding you are dealing with.
It will produce something that is three separate characters
according to the server encoding.  In LATIN1, that could well be
the byte sequence 0xe2 0x80 0xa6, but *that byte sequence does not
mean the same thing that it would mean in UTF8 encoding*.

You also seem not to grasp the fact that an encoding conversion
will happen between your client and the server if client_encoding
is different from server_encoding.  Because of that, the output of
a SELECT command doesn't prove much of anything here.

			regards, tom lane