Search Postgresql Archives

Re: varchar sort ordering ignore blanks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sunday 20 January 2008 01:07, Tom Lane wrote:
> "Luca Arzeni" <l.arzeni@xxxxxxxxxxx> writes:
> > Is there any way to consider blanks meaningfull AND sort properly locale
> > specific vowels ?
>
> This isn't a Postgres question, it's a locale question.  (If you try,
> you'll find that sort(1) sorts the same as we do in any given locale.)
>
> I imagine you could create a custom locale definition that acts this
> way, but I have no idea about the degree of wizardry involved.
> "man localedef" would probably be a place to start.
>
> If you come up with a reasonably simple recipe for this, please post
> it here, as you're not the first to have asked and you likely won't
> be the last ...
>
> 			regards, tom lane

Thanks Tom,
I gave a look at localedef, but it seems too much complex for my 
understanding. 

I understood that (under debian etch) lc_collate is defined for posix in file:
/usr/share/i18n/locales/POSIX

Here actually I can find an undestandable LC_COLLATE directive that defines 
all chars one after the other in ASCII order. Thats fine.

Then I looked at it_IT locale definition and noticed that this locale (and 
many if not all other locales, as far as I can see) have a collation order 
inherited from the file:
/usr/share/i18n/locales/iso14651_t1

This seems to be a iso standard, but it is not easily understandable by me. It 
includes macros and defines also sorting for many character sets, including 
arabic and grec.

I think I've found the problem (space is actually ignored and put in a 
separate list from other chars), but I'm not able to understand what shoud be 
the proper value to put in the row.

At line 537 I can read the following lines:
order_start <SPECIAL>;forward;backward;forward;forward,position
#
# Tout caractère non précisément défini sera considéré comme caractère spécial
# et considéré uniquement au dernier niveau.
#
# Any character not precisely specified will be considered as a special
# character and considered only at the last level.
# <U0000>......<U7FFFFFFF> IGNORE;IGNORE;IGNORE;<U0000>......<U7FFFFFFF>
#
# SYMB.                                N° GLY
#
<U0020> IGNORE;IGNORE;IGNORE;<U0020> # 32 <SP>

so I guess space is beeing ignored.

At line 810 I can read:
<U2079> <9>;<BAS>;<EMI>;IGNORE # 197 <9S>
#
<U0061> <a>;<BAS>;<MIN>;IGNORE # 198 a
<U00AA> <a>;<PCL>;<EMI>;IGNORE # 199 ª
<U00E1> <a>;<ACA>;<MIN>;IGNORE # 200 á
<U00E0> <a>;<GRA>;<MIN>;IGNORE # 201 à
<U00E2> <a>;<CIR>;<MIN>;IGNORE # 202 â
<U00E3> <a>;<TIL>;<MIN>;IGNORE # 203 ã
<U00E4> <a>;<REU>;<MIN>;IGNORE # 204 ä
<U00E5> <a>;<RNE>;<MIN>;IGNORE # 205 å

So my guess is that if I could put properly the space AFTER the commented line 
and before the linte that starts with <U0061> (that is, before the lower "a") 
I could solve the problem.

May be that some locale wizard is listening around and can help me?

Thanks, Luca

-- 
Ing. Luca Arzeni
Amadego S.R.L.
tel.: 02 6193672
cell.: 339 8350298
mailto: l.arzeni@xxxxxxxxxxx

=== Start-of Internet E-mail Confidentiality Footer ===

L'uso non autorizzato di questo messaggio o dei suoi allegati e' vietato e 
potrebbe costituire reato.
Se ha ricevuto per errore questo messaggio, La preghiamo di informarci e di 
distruggerlo immediatamente coi suoi allegati.
Le dichiarazioni contenute in questo messaggio o nei suoi allegati non 
impegnano Amadego S.R.L. nei confronti del destinatario o di terzi.
Amadego S.R.L. non si assume alcuna responsabilita' per eventuali 
intercettazioni, modifiche o danneggiamenti del presente messaggio.

Any unauthorized use of this e-mail or any of its attachments is prohibited 
and could constitute an offence.
If you are not the intended addressee please advise immediately the sender and 
destroy the message and its attachments.
The contents of this message shall be understood as neither given nor endorsed 
by Amadego S.R.L.
Amadego S.R.L. does not accept liability for corruption, interception or 
amendment, if any, or the consequences thereof.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux