Don Seiler <don@xxxxxxxxx> writes: > On Tue, Mar 17, 2020 at 8:56 AM Tom Lane <tgl@xxxxxxxxxxxxx> wrote: >> Yikes. Well, if there aren't obvious operational problems, it might be >> that the data is actually UTF8-clean, or almost entirely so. Maybe you >> could look at the problem as being one of validation. > For this test, would we restore into an en_US.UTF-8/UTF8 database? Then, > assuming no errors (or fixing any errors until clean), we change the > datcollate/datctype settings in prod and proceed with pg_upgrade (obviously > after testing all of that heavily)? Yeah, that's the basic idea. > What are the ramifications of changing collation like that? Should we > consider rebuilding indexes ASAP after that? Text indexes would definitely be at risk here. I'm not really certain how bad the problem would be. Do you have a feeling for how much of the data is 100% ASCII? If you could be sure of that for any given column, you wouldn't have to reindex indexes on that column. regards, tom lane