On 04/16/2018 08:16 AM, Keith Fiske wrote:
Running into an issue with helping a client upgrade from 8.3 to 10 (yes,
I know, please keep the out of support comments to a minimum, thanks :).
The old database was in SQL_ASCII and it needs to stay that way for now
unfortunately. The dump and restore itself works fine, but we're now
running into issues with some data returning encoding errors unless we
specifically set the client_encoding value to SQL_ASCII.
Looking at the 8.3 database, it has the client_encoding value set to
UTF8 and queries seem to work fine. Is this just a bug in the old 8.3
not enforcing encoding properly?e
AFAIK, SQL_ASCII basically means no encoding:
https://www.postgresql.org/docs/10/static/multibyte.html
"The SQL_ASCII setting behaves considerably differently from the other
settings. When the server character set is SQL_ASCII, the server
interprets byte values 0-127 according to the ASCII standard, while byte
values 128-255 are taken as uninterpreted characters. No encoding
conversion will be done when the setting is SQL_ASCII. Thus, this
setting is not so much a declaration that a specific encoding is in use,
as a declaration of ignorance about the encoding. In most cases, if you
are working with any non-ASCII data, it is unwise to use the SQL_ASCII
setting because PostgreSQL will be unable to help you by converting or
validating non-ASCII characters."
What client are you working with?
If psql then its behavior has changed between 8.3 and 10:
https://www.postgresql.org/docs/10/static/release-9-1.html#id-1.11.6.121.3
"
Have psql set the client encoding from the operating system locale by
default (Heikki Linnakangas)
This only happens if the PGCLIENTENCODING environment variable is not set.
"
https://www.postgresql.org/docs/10/static/app-psql.html
"If both standard input and standard output are a terminal, then psql
sets the client encoding to “auto”, which will detect the appropriate
client encoding from the locale settings (LC_CTYPE environment variable
on Unix systems). If this doesn't work out as expected, the client
encoding can be overridden using the environment variable PGCLIENTENCODING."
The other thing I noticed on the 10 instance was that, while the LOCALE
was set to SQL_ASCII, the COLLATE and CTYPE values for the restored
databases were en_US.UTF-8. Could this be having an affect? Is there any
way to see what these values were on the old 8.3 database? The
pg_database catalog does not have these values stored back then.
--
Keith Fiske
Senior Database Engineer
Crunchy Data - http://crunchydata.com
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx