Don Seiler <don@xxxxxxxxx> writes: > On Mon, Mar 16, 2020 at 10:28 AM Tom Lane <tgl@xxxxxxxxxxxxx> wrote: >> I don't think you should use pg_upgrade here at all. A dump/restore >> is really the only way to make sure that you have validly encoded data. > That is what I thought, and probably not what they'll want to hear given > the downtime involved. Even with parallel dump/restore jobs, I imagine it > will take quite a while (this first DB is almost 900GB). Yikes. Well, if there aren't obvious operational problems, it might be that the data is actually UTF8-clean, or almost entirely so. Maybe you could look at the problem as being one of validation. In that case, it'd be possible to consider not taking the production DB down, but just doing a pg_dump from it and seeing if you can restore somewhere else. If not, fix the broken data; repeat till clean. After that you could do pg_upgrade with a clear conscience. I think you'll still end up manually fixing the inconsistent datcollate/datctype settings though. > Is logical replication an option here? If the target DB were setup as > en_US.UTF-8 across the board, would logical replication safely replicate > and convert the data until we could then cut over? I think you need to make sure the data is clean first. I doubt that logical replication will magically fix any problems in data it's trying to push over, and I also doubt that we have any really good answer to what happens if a replication update fails due to bad data. regards, tom lane