Re: How well does PostgreSQL 9.6.1 support unicode?

James Zhou <james@xxxxxxxxxx> · Tue, 20 Dec 2016 23:17:56 -0800

both lc_type and lc_collate are en_US.UTF-8. Sorry for missing them in the original post.
I understand that collate has impact on sorting order, but the fact that char_length() is not returning the correct length in char for certain characters (non-BMP) is an indication that unicode is not fully supported. If char_length() is not working properly, I'd expect that substring() won't work either.

The PostgreSQL I am using is an AWS PostgreSQL RDS. I can check with AWS, but presumably that they are running PostgreSQL RDS on some flavor of lunix.

My client is PgAdmin 4 running on a Windows 7 machine. I understand that some client tools may not be able to display all unicode chars, but I do expect that the function ascii() return correct values  of the stored chars. 

For me the primary requirement is storing and retrieving all unicode characters as they are, and char_length() returns the correct values for all supported unicode chars. Correct sorting is nice-to-have. 

Any help to get unicode chars, particularly the mojos (0x1F478, 0x1F479), in and out of pg correctly is much appreciated. Thank you!

James   

On Tue, Dec 20, 2016 at 9:24 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
James Zhou <james@xxxxxxxxxx> writes:

>       - *But their sorting order seems to be undefined. Can anyone comment

>       the sorting rules?*

Well, it would depend on lc_collate, which you have not told us, and

it would also depend on how well your platform's strcoll() function

implements that collation; but you have not told us what platform this

is running on.

Most of the other behaviors you mention are also partly or wholly

dependent on which software you use with Postgres and whether you've

correctly configured that software.  So it's pretty hard to answer

this usefully with only this much info.

                        regards, tom lane