Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

Tom Lane <tgl@xxxxxxxxxxxxx> · Mon, 14 Sep 2020 20:26:02 -0400

raf <raf@xxxxxxx> writes:
> On Mon, Sep 14, 2020 at 05:39:57PM -0400, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
>> On the other hand, the very same thing could be said of database names
>> and role names, yet we have never worried much about whether those were
>> encoding-safe when viewed from databases with different encodings, nor
>> have there been many complaints about the theoretical unsafety.  So maybe
>> this is just overly anal-retentive and we should drop the restriction,
>> or at least pass through data that doesn't appear to be invalidly
>> encoded.

> Perhaps recode database/role names from the source
> database's encoding into utf8, and then recode from utf8
> to the destination database's encoding?

A lot of people seem to believe that transcoding through utf8
is 100% safe.  They're wrong :-( --- the Japanese, at least,
have reason not to trust it, because of the existence of multiple
incompatible conversion standards.  And you're still left with the
question of what to do when the destination encoding hasn't
got the character.

Moreover, this is all moderately expensive unless the encodings in
question are already utf8 or latin1.  So if we go this way I'd
prefer to do it as I said above -- just drop or question-mark-ize
any characters that don't pass validation in the recipient DB.
That's fairly cheap and it will work perfectly in the typical case
where the whole cluster is on one encoding anyway.

			regards, tom lane