At 01:40 PM 12/16/2005 -0500, Tom Lane wrote:
Nobody's said anything about giving up locale-sensitive sorting. The
question is about locale-sensitive equality: does it really make sense
that 'tty' = 'tyty'? Would your answer change in the context
'/dev/tty' = '/dev/tyty'? Are you willing to *not have access* to a
text comparison operator that will make the distinction?
I'm inclined to think that this is more like the occasional need for
accent-insensitive comparisons. It seems generally agreed that you want
something like smash('ab') = smash('áb') rather than making the
strings equal in all contexts.
I agree.
I would prefer for everything to be compared without any
collation/corruption by default, and for there to be a function to pick the
desired comparison behaviour ( Can all that functionality be done with the
collate clause?).
Because most databases are multi-locale whether the humans are aware of it
or not:
The Computer "locale", human locale #1, unknown/international locale, human
locale #2, ...
In a column for license keys, "tty" should rarely be the same as "tyty".
In a column for base64 data (crypto hashes, etc) "tty" should NEVER be the
same as "tyty".
In a column for domain names, I doubt it is clear whether you want to match
tty.ibm.hu just because tyty.ibm.hu exists.
But in a column for license owner names, one might want "tty" and "tyty" to
be the same - one might have to have a multicolumn index depending on the
owner's locale of choice.
I recommend that for these reasons initdb should always pick "no mangled"
text by default, no matter what the locale setting is. And that users
should be advised of the potential consequences of mangling or I would even
say corrupting all text in their databases by default.
Regards,
Link.