On Fri, Dec 16, 2005 at 12:12:08PM -0500, Tom Lane wrote: > Perhaps the fast-path check is a bad idea, but fixing this is not just > a matter of removing that. If we subscribe to strcoll's worldview then > we have to conclude that *text strings are not hashable*, because > strings that should be "equal" may have different hash codes. And at > least in the current PG code, that's not something we can flip on and off > depending on the locale --- texteq would have to be marked non hashable > in the system catalogs, meaning a big performance hit for *everybody* > even if their locale is not this weird. That's true, in the sense that unconverted strings are not hashable. This is what strxfrm was created for, to return the sorting key for a string. A quick C program demonstrates that indeed in that locale these two strings are equal, whereas in en_AU they are not. $ LC_ALL=hu_HU ./strxfrm potyty potty String 1: potyty Strxfrm 1: " ((\x01\x02\x02\x02\x02\x01\x02\x02\x02\x02 String 2: potty Strxfrm 2: " ((\x01\x02\x02\x02\x02\x01\x02\x02\x02\x02 $ LC_ALL=en_AU ./strxfrm potyty potty String 1: potyty Strxfrm 1: \x1B\x1A\x1F$\x1F$\x01\x02\x02\x02\x02\x02\x02\x01\x02\x02\x02\x02\x02\x02 String 2: potty Strxfrm 2: \x1B\x1A\x1F\x1F$\x01\x02\x02\x02\x02\x02\x01\x02\x02\x02\x02\x02 I think the only way to make indexes properly locale sensetive would be to either use strcoll() in all cases, or store the result from strxfrm() in the index. Anything else will break somewhere. In any case, we first need to determine which answer is correct, before we run off trying to fix it. This is Glibc 2.3.2 on a Debian Linux system. Have a nice day, -- Martijn van Oosterhout <kleptog@xxxxxxxxx> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
#include <string.h> #include <stdio.h> #include <ctype.h> #include <locale.h> void DumpString(unsigned char *s) { while(*s) { if( isprint(*s) ) printf( "%c", *s ); else printf( "\\x%02X", *s ); s++; } } int main(int argc, char *argv[]) { char buffer[100]; int i; setlocale(LC_ALL,""); for( i=1; i<argc; i++ ) { printf("String %2d: ", i); DumpString(argv[i]); strxfrm( buffer, argv[i], 100 ); printf("\nStrxfrm %d: ", i ); DumpString(buffer); printf("\n"); } return 0; }
Attachment:
pgpu9NwFPoxCb.pgp
Description: PGP signature