Good morning,
I have a few clusters that need to be upgraded from PG 9.6 or 10 to 12. Normally I'd just plan to pg_upgrade the lot and be good to go. However I've found that quite a few (including our biggest/busiest database) have mixed locales. In the case of the biggest/busiest database, the cluster was created with locale en_US (NOT en_US.UTF-8), and so the databases have encoding LATIN1.
However this database has encoding UTF8 while still having ctype and collation of en_US. I've since found that when this was last upgraded, they ran "update pg_database set encoding = pg_char_to_encoding('UTF8') where datname = 'test';" to change the encoding. In my testing, pg_upgrade breaks when trying to restore this since UTF8 isn't supported in en_US for the CREATE DATABASE command used during pg_restore:
command: "/usr/lib/postgresql/12/bin/pg_restore" --host /var/lib/postgresql --port 50432 --username postgres --create --exit-on-error --verbose --dbname template1 "pg_upgrade_dump_16385.custom" >> "pg_upgrade_dump_16385.log" 2>&1
pg_restore: connecting to database for restore
pg_restore: creating DATABASE "test"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 2164; 1262 16385 DATABASE test postgres
pg_restore: error: could not execute query: ERROR: encoding "UTF8" does not match locale "en_US"
DETAIL: The chosen LC_CTYPE setting requires encoding "LATIN1".
Command was: CREATE DATABASE "test" WITH TEMPLATE = template0 ENCODING = 'UTF8' LC_COLLATE = 'en_US' LC_CTYPE = 'en_US';
--
pg_restore: connecting to database for restore
pg_restore: creating DATABASE "test"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 2164; 1262 16385 DATABASE test postgres
pg_restore: error: could not execute query: ERROR: encoding "UTF8" does not match locale "en_US"
DETAIL: The chosen LC_CTYPE setting requires encoding "LATIN1".
Command was: CREATE DATABASE "test" WITH TEMPLATE = template0 ENCODING = 'UTF8' LC_COLLATE = 'en_US' LC_CTYPE = 'en_US';
Is there a way around this while still using pg_upgrade? My understanding is that a full dump and restore into a new DB with everything set to en_US.UTF-8 is the only to convert these (if I have to dump/restore, I wouldn't want to keep the mixed environment). Even with parallel jobs, I imagine that's a bit of downtime but I'll have to wait until I can get a copy of prod data to test with to be sure.
Is logical replication an option here? Either maintaining the mixed environment or converting everything to en_US.UTF-8? I'm relatively new in this shop but I'm told they didn't mean to use en_US and there's no reason they wouldn't want to just use the standard/default UTF-8.
Thanks,
Don.
Don Seiler
www.seiler.us
www.seiler.us