Hmmm. Yes the unicode rules seem to be a little strict on conforming to the past! I just made the following fix to my python code in forming the upper case alphabet from the lower case one:
uc_alphabet = lc_alphabet.replace('ß', 'ẞ').upper()
So far I have only found German to have a lower case letter which has the same value for its upper cased one.
Thanks,
Celia McInnis
.
On Mon, Mar 13, 2023 at 6:54 PM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
"Peter J. Holzer" <hjp-pgsql@xxxxxx> writes:
> On 2023-03-13 17:38:51 -0400, Celia McInnis wrote:
>> I would be really happy if postgresql had an upper case version of the ß
>> german character.
> But the 'ß' is a bit special as it is usually uppercased to 'SS'
> (although 'ẞ' is now officially allowed, too).
> Apparently your (and my) locale doesn't uppercase ß at all, which isn't
> correct according to German spelling rules but was very common in the
> last decades.
Our code for libc locales doesn't support upcasing 'ß' to 'SS',
because it uses towlower() which can only manage
one-character-to-one-character transformations. It should work for
upcasing to 'ẞ', but as you say, you need to find a locale that thinks
that should happen.
You might have better luck if you have a version of Postgres that
supports ICU and you can use an ICU locale. That code path doesn't
appear to have any hard-wired assumption about how many characters
in convert to how many out.
regards, tom lane