On 2018-01-12 11:08:39 +0100, Francisco Olarte wrote: > C collation is like sorting raw bytes, it doesn't event sort > upper/lower case correctly Now you are falling into the same trap as Durumdara, calling an unintended sort order "not correct" ;-). C collation is certainly not what a "normal user" expects. It is therefore wrong for many applications (e.g., you couldn't use it to sort a telephone book), but it might be correct for others (e.g., it you need a stable, unambiguous sort order but don't care much about the order itself). (By coincidence, I also stumbled over the fact that the en_US.UTF-8 collation ignores punctuation characters and spaces at least in the first pass - took me a minute or so to figure out why I got an "inconsistent" (read: unexpected and not obvious for me) sort order.) Lexicographers are interested in sorting single words. Therefore they aren't interested in punctuation. They may also not be very interested in accents, because most languages have few words which differ only by an accent (and it the dictionary is printed on paper, the user will easily be able to scan up and down a few centimeters and find the right entry without knowing whether "à" is sorted before or after "á"). Somebody preparing an address list should care about punctuation: You would probably not expect to find "Smith-Jones, Amanda" between "Smith, John" and "Smith, Julia". And you probably want to sort "23 Main Street" before "180 Main Street". Which brings us back to Durumdara's example: I don't know his application, so I don't know what "normal users" of his application would expect, but to me those values look like two numbers separated by a slash. So I would expect '23/4567' to be sorted between '18/0212' and '180/2010', but of course the C collation doesn't do that: => select * from foo order by t collate "C"; ╔══════════╗ ║ t ║ ╟──────────╢ ║ 18/0113 ║ ║ 18/0212 ║ ║ 180/2010 ║ ║ 23/4567 ║ ╚══════════╝ (4 rows) It might be possible to define a custom collation for that, but in a case like this I would either split this field into two integer fields or use a function-based index. hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@xxxxxx | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
Attachment:
signature.asc
Description: PGP signature