Re: unexpected character used as group separator by to_char

Alvaro Herrera <alvherre@xxxxxxxxxxxxxx> · Wed, 10 Mar 2021 09:58:20 -0300

On 2021-Mar-10, Gavan Schneider wrote:

> On 10 Mar 2021, at 16:24, Alvaro Herrera wrote:
> 
> > That space (0xe280af) is U+202F, which appears to be used for French and
> > Mongolian languages (exclusively?).  It is quite possible that in the
> > future some other language will end up using some different whitespace
> > character, possibly breaking any code you write today -- the use of
> > U+202F appears to be quite recent.
> > 
> Drifting off topic a little. That a proper code point for things that will
> benefit from the whitespace but should still stay together.
> Also it’s not that new, added in 1999 — https://codepoints.net/U+202F

I probably got misled on this whole thing by these change proposals.
https://www.unicode.org/L2/L2019/19116-clarify-nnbsp.pdf
https://www.unicode.org/L2/L2020/20008-core-text.pdf
Apparently prior to this, they (?) had been using/recommending
THIN SPACE U+2009 as separator, which is not non-breaking.

Anyway, it reinforces my point that it's not impossible that some other
locale definition could use U+2009 when printing numbers, or even some
other kind of spacing entity in non-Latin languages etc.  So I think
that for truly robust handling you should separate the thing you use for
display from the thing you use to talk to the database.

> And the thin space is part of the international standard for breaking up
> large numbers (from 1948), specifically no dots or commas should be used in
> this role. The dot or comma is only to be used for the decimal point!

Interesting U+2014 EM DASH I didn't know this.

-- 
Álvaro Herrera       Valdivia, Chile
"This is a foot just waiting to be shot"                (Andrew Dunstan)