Re: Mixed Locales and Upgrading

Tom Lane <tgl@xxxxxxxxxxxxx> · Tue, 17 Mar 2020 10:45:50 -0400

Don Seiler <don@xxxxxxxxx> writes:
> On Tue, Mar 17, 2020 at 8:56 AM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
>> Yikes.  Well, if there aren't obvious operational problems, it might be
>> that the data is actually UTF8-clean, or almost entirely so.  Maybe you
>> could look at the problem as being one of validation.

> For this test, would we restore into an en_US.UTF-8/UTF8 database? Then,
> assuming no errors (or fixing any errors until clean), we change the
> datcollate/datctype settings in prod and proceed with pg_upgrade (obviously
> after testing all of that heavily)?

Yeah, that's the basic idea.

> What are the ramifications of changing collation like that? Should we
> consider rebuilding indexes ASAP after that?

Text indexes would definitely be at risk here.  I'm not really certain
how bad the problem would be.  Do you have a feeling for how much of
the data is 100% ASCII?  If you could be sure of that for any given
column, you wouldn't have to reindex indexes on that column.

			regards, tom lane