Re: Regular expression to UPPER() a lower case string

"Peter J. Holzer" <hjp-pgsql@xxxxxx> · Sat, 10 Dec 2022 15:08:43 +0100

On 2022-12-10 13:44:37 +0000, Gianni Ceccarelli wrote:
> On 2022-12-10 "Peter J. Holzer" <hjp-pgsql@xxxxxx> wrote:
> > > * your logic only works by accident for some languages (try to
> > > upcase a `ß` or a `ı`)  
> > 
> > This is also true of upper() and lower() and SQL does provide those.
> 
> Well…
> 
> > select upper('ı');
> ┌───────┐
> │ upper │
> ├───────┤
> │ I     │
> └───────┘
> (1 row)

This is I think universally correct. A better example would be
upper('i') which should be 'İ' in Turkish and 'I' in most other
languages.

> > select upper('ß');
> ┌───────┐
> │ upper │
> ├───────┤
> │ ß     │
> └───────┘
> (1 row)

This is incorrect according to German spelling rules. It should be
either 'SS' (traditionally) or 'ẞ' (since the introduction of the
upper-case sharp s). However, given the long absence of the ẞ from
official German orthography and the lack of reversability of the ß → SS
mapping it has been (and still is) quite common to leave the ß in lower
case.

> > select upper('ä');
> ┌───────┐
> │ upper │
> ├───────┤
> │ Ä     │
> └───────┘
> (1 row)

Correct (in German[1] and probably any other language).

So, what's the point you are trying to make?

> Of course all of this is dependent of locale, too.

Right. But why would that be different for regexp_replace than for
upper/lower)?

        hp

[1] Although I have one book which uses ä, ö, ü for lower case but Ae,
    Oe, Ue for upper case letters.

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@xxxxxx         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"
Attachment:
signature.asc

Description: PGP signature