Re: Very puzzling sort behavior

Ken Tanzer <ken.tanzer@xxxxxxxxx> · Thu, 10 Sep 2015 13:49:18 -0700

On Thu, Sep 10, 2015 at 12:56 PM, Peter Geoghegan <peter.geoghegan86@xxxxxxxxx> wrote:
On Thu, Sep 10, 2015 at 12:51 PM, Ken Tanzer <ken.tanzer@xxxxxxxxx> wrote:
> OK, can one of you help me out in understanding this?  I would have thought that given "CLARK," and "CLARKE" that the comma would get compared against the E and come first.  End of story, before we even get to anything farther in the string.  What am I missing?

That's only how it works with the C locale. Otherwise, there are
complicated rules to weigh things like space and punctuation (and
accents/diacritics) less prominently than primary alphabetical
ordering. This is often useful. Anyway, based on what you say here, I
think you should actually "ORDER BY name_last, name_first".

--
Regards,
Peter Geoghegan

Thanks.  A little more help would be appreciated.  First a little context:
What I mailed out what a boiled down example.  In reality, what I have is a ton of tables with a client_id in them, and a convenience function client_name(client_id) that returns the name_last, name_first string (plus an alias if it exists).  client_name is used all over the place in both views and in an app that uses the database.  There is a similar, also heavily used, staff_name function.  Eliminating the use of these functions is a non-starter for me--I'd much rather live with the existing sort behavior, especially as no one has ever pointed this out despite over a decade of use.

I'm hoping to sort change this behavior with as minimal a change as possible (e.g., minimal potential for unexpected side effects or breakage).  I was hoping to just add a COLLATE "C" within the function:

CREATE OR REPLACE FUNCTION client_name( cid int4 ) RETURNS text AS $$
     -- client is a view that draws from tbl_client.  name_full is the field with the name_last, name_first data in it
     SELECT name_full COLLATE "C" FROM client WHERE client_id=$1;
$$

but that seems to have no effect.  And sure enough the documentation seems to back that up. ("The collation assigned to a function or operator's combined input expressions is also considered to apply to the function or operator's result, if the function or operator delivers a result of a collatable data type.")  So this may be wishful thinking, but is there any other way to specify the collation of a function result?  Specifying the collation every time the function is used is likely a no-go for me too.

Alternatively, it seems I could create new databases with a C collation and then move my data into them.  This seems a bit drastic, although possible.  I'd again be worried about the breakage/side effects.  And actually, will this work?  (i.e., can you use pg_dump to populate a new database with a different locale?)

Are there any other potential solutions, pitfalls or considerations that come to mind?  Any thoughts welcome.  And as I said, if there's not a good way to do this I'll probably leave it alone.  Thanks.

Ken

-- 
AGENCY Software  
A Free Software data system
By and for non-profits
http://agency-software.org/
https://agency-software.org/demo/client
ken.tanzer@xxxxxxxxxxxxxxxxxxx
(253) 245-3801

Subscribe to the mailing list to
learn more about AGENCY or
follow the discussion.