On 15/01/2014 12:36, Amit Langote wrote: > On Wed, Jan 15, 2014 at 7:39 PM, Ivan Voras <ivoras@xxxxxxxxxxx> wrote: >> On 15/01/2014 10:10, Gábor Farkas wrote: >>> hi, >>> >>> when i create an unique-constraint on a varchar field, how exactly >>> does postgresql compare the texts? i'm asking because in UNICODE there >>> are a lot of complexities about this.. >>> >>> or in other words, when are two varchars equal in postgres? when their >>> bytes are? or some algorithm is applied? >> >> By default, it is "whatever the operating system thinks it's right". >> PostgreSQL doesn't have its own collation code, it uses the OS's locale >> support for this. >> > > Just to add to this, whenever strcoll() (a locale aware comparator) > says two strings are equal, postgres re-compares them using strcmp(). > See following code snippet off > src/backend/utils/adt/varlena.c:varstr_cmp() - > /* > * In some locales strcoll() can claim that > nonidentical strings are > * equal. Believing that would be bad news for a > number of reasons, > * so we follow Perl's lead and sort "equal" strings > according to > * strcmp(). > */ > if (result == 0) > result = strcmp(a1p, a2p); That seems odd and inefficient. Why would it be necessary? I would think indexing (and other collation-sensitive operations) don't care what the actual collation result is for arbitrary blobs of strings, as long as they are stable?
Attachment:
signature.asc
Description: OpenPGP digital signature