On Tue, Mar 17, 2020 at 8:56 AM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Yikes. Well, if there aren't obvious operational problems, it might be
that the data is actually UTF8-clean, or almost entirely so. Maybe you
could look at the problem as being one of validation. In that case,
it'd be possible to consider not taking the production DB down, but just
doing a pg_dump from it and seeing if you can restore somewhere else.
If not, fix the broken data; repeat till clean. After that you could
do pg_upgrade with a clear conscience. I think you'll still end up
manually fixing the inconsistent datcollate/datctype settings though.
For this test, would we restore into an en_US.UTF-8/UTF8 database? Then, assuming no errors (or fixing any errors until clean), we change the datcollate/datctype settings in prod and proceed with pg_upgrade (obviously after testing all of that heavily)?
What are the ramifications of changing collation like that? Should we consider rebuilding indexes ASAP after that?
Don.
Don Seiler
www.seiler.us
www.seiler.us