Search Postgresql Archives

Re: String comparison problem in select - too many results

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2018-01-12 11:08:39 +0100, Francisco Olarte wrote:
> C collation is like sorting raw bytes, it doesn't event sort
> upper/lower case correctly

Now you are falling into the same trap as Durumdara, calling an
unintended sort order "not correct" ;-).

C collation is certainly not what a "normal user" expects. It is
therefore wrong for many applications (e.g., you couldn't use it to
sort a telephone book), but it might be correct for others (e.g., it you
need a stable, unambiguous sort order but don't care much about the
order itself).

(By coincidence, I also stumbled over the fact that the en_US.UTF-8
collation ignores punctuation characters and spaces at least in the
first pass - took me a minute or so to figure out why I got an
"inconsistent" (read: unexpected and not obvious for me) sort order.)

Lexicographers are interested in sorting single words. Therefore they
aren't interested in punctuation. They may also not be very interested
in accents, because most languages have few words which differ only by
an accent (and it the dictionary is printed on paper, the user will
easily be able to scan up and down a few centimeters and find the right
entry without knowing whether "à" is sorted before or after "á").

Somebody preparing an address list should care about punctuation: You
would probably not expect to find "Smith-Jones, Amanda" between "Smith,
John" and "Smith, Julia". And you probably want to sort "23 Main Street"
before "180 Main Street".

Which brings us back to Durumdara's example: I don't know his
application, so I don't know what "normal users" of his application
would expect, but to me those values look like two numbers separated by
a slash. So I would expect '23/4567' to be sorted between '18/0212' and
'180/2010', but of course the C collation doesn't do that:

=> select * from foo order by t collate "C";
╔══════════╗
║    t     ║
╟──────────╢
║ 18/0113  ║
║ 18/0212  ║
║ 180/2010 ║
║ 23/4567  ║
╚══════════╝
(4 rows)

It might be possible to define a custom collation for that, but in a
case like this I would either split this field into two integer fields
or use a function-based index.

        hp

-- 
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | hjp@xxxxxx         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

Attachment: signature.asc
Description: PGP signature


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux