Mixed Locales and Upgrading

Don Seiler <don@xxxxxxxxx> · Mon, 16 Mar 2020 09:24:13 -0500

Good morning,
I have a few clusters that need to be upgraded from PG 9.6 or 10 to 12. Normally I'd just plan to pg_upgrade the lot and be good to go. However I've found that quite a few (including our biggest/busiest database) have mixed locales. In the case of the biggest/busiest database, the cluster was created with locale en_US (NOT en_US.UTF-8), and so the databases have encoding LATIN1. 

However this database has encoding UTF8 while still having ctype and collation of en_US. I've since found that when this was last upgraded, they ran "update pg_database set encoding = pg_char_to_encoding('UTF8') where datname = 'test';" to change the encoding. In my testing, pg_upgrade breaks when trying to restore this since UTF8 isn't supported in en_US for the CREATE DATABASE command used during pg_restore:

command: "/usr/lib/postgresql/12/bin/pg_restore" --host /var/lib/postgresql --port 50432 --username postgres --create --exit-on-error --verbose --dbname template1 "pg_upgrade_dump_16385.custom" >> "pg_upgrade_dump_16385.log" 2>&1
pg_restore: connecting to database for restore
pg_restore: creating DATABASE "test"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 2164; 1262 16385 DATABASE test postgres
pg_restore: error: could not execute query: ERROR:  encoding "UTF8" does not match locale "en_US"
DETAIL:  The chosen LC_CTYPE setting requires encoding "LATIN1".
Command was: CREATE DATABASE "test" WITH TEMPLATE = template0 ENCODING = 'UTF8' LC_COLLATE = 'en_US' LC_CTYPE = 'en_US';

Is there a way around this while still using pg_upgrade? My understanding is that a full dump and restore into a new DB with everything set to en_US.UTF-8 is the only to convert these (if I have to dump/restore, I wouldn't want to keep the mixed environment). Even with parallel jobs, I imagine that's a bit of downtime but I'll have to wait until I can get a copy of prod data to test with to be sure.

Is logical replication an option here? Either maintaining the mixed environment or converting everything to en_US.UTF-8? I'm relatively new in this shop but I'm told they didn't mean to use en_US and there's no reason they wouldn't want to just use the standard/default UTF-8.

Thanks,
Don.

-- 
Don Seiler
www.seiler.us