Greg Stark <gsstark@xxxxxxx> writes: > Tom Lane <tgl@xxxxxxxxxxxxx> writes: > > > If that does change the results, it indicates you've got strings which > > are bytewise different but compare equal according to strcoll(). We've > > seen this and other misbehaviors from some locale definitions when faced > > with data that is invalid per the encoding the locale expects. > > There are plenty of non-bytewise-identical strings that do legitimately > compare equal in various locales. Does the hash code hash strxfrm or the > original bytes? Hm. Some experimentation shows that at least on glibc's locale definitions the strings that I thought compared equal don't actually compare equal. Capitalization, punctuation, white space, while they're basically ignored in general in non-C locales do seem to compare non-equal when they're the only differentiating factor. Is this guaranteed by any spec? Or is counting on this behaviour unsafe? If it's legal for strcoll to compare as equal two byte-wise different strings then the hash function really ought to be calling strxfrm before hashing or else it will be inconsistent. It doesn't seem to be doing so currently. I find it interesting that Perl has faced this same dilemma and chose to override the locale definition in this case. If the locale definitions compares two strings equally then Perl does a bytewise comparison and uses that to break ties. This guarantees non-bytewise-identical strings don't compare eqal. I suspect they did it for a similar reason too, namely keeping the semantics in sync with perl hashes. Postgres could follow that model, I think it would solve any inconsistencies just fine and not cause problems. However it would be visible to users which may be considered a bug if the locale really does claim the strings are equal but Postgres doesn't agree. On the other hand I think it would perform better than a lot of extra calls to strxfrm since it would only rarely kick in with an extra memcmp. -- greg ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster