Re: Mixed Locales and Upgrading

Tom Lane <tgl@xxxxxxxxxxxxx> · Tue, 17 Mar 2020 09:56:01 -0400

Don Seiler <don@xxxxxxxxx> writes:
> On Mon, Mar 16, 2020 at 10:28 AM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
>> I don't think you should use pg_upgrade here at all.  A dump/restore
>> is really the only way to make sure that you have validly encoded data.

> That is what I thought, and probably not what they'll want to hear given
> the downtime involved. Even with parallel dump/restore jobs, I imagine it
> will take quite a while (this first DB is almost 900GB).

Yikes.  Well, if there aren't obvious operational problems, it might be
that the data is actually UTF8-clean, or almost entirely so.  Maybe you
could look at the problem as being one of validation.  In that case,
it'd be possible to consider not taking the production DB down, but just
doing a pg_dump from it and seeing if you can restore somewhere else.
If not, fix the broken data; repeat till clean.  After that you could
do pg_upgrade with a clear conscience.  I think you'll still end up
manually fixing the inconsistent datcollate/datctype settings though.

> Is logical replication an option here? If the target DB were setup as
> en_US.UTF-8 across the board, would logical replication safely replicate
> and convert the data until we could then cut over?

I think you need to make sure the data is clean first.  I doubt that
logical replication will magically fix any problems in data it's trying
to push over, and I also doubt that we have any really good answer to
what happens if a replication update fails due to bad data.

			regards, tom lane