On Wed, Jan 15, 2014 at 9:02 PM, Ivan Voras <ivoras@xxxxxxxxxxx> wrote: > On 15/01/2014 12:36, Amit Langote wrote: >> On Wed, Jan 15, 2014 at 7:39 PM, Ivan Voras <ivoras@xxxxxxxxxxx> wrote: >>> On 15/01/2014 10:10, Gábor Farkas wrote: >>>> hi, >>>> >>>> when i create an unique-constraint on a varchar field, how exactly >>>> does postgresql compare the texts? i'm asking because in UNICODE there >>>> are a lot of complexities about this.. >>>> >>>> or in other words, when are two varchars equal in postgres? when their >>>> bytes are? or some algorithm is applied? >>> >>> By default, it is "whatever the operating system thinks it's right". >>> PostgreSQL doesn't have its own collation code, it uses the OS's locale >>> support for this. >>> >> >> Just to add to this, whenever strcoll() (a locale aware comparator) >> says two strings are equal, postgres re-compares them using strcmp(). >> See following code snippet off >> src/backend/utils/adt/varlena.c:varstr_cmp() - > >> /* >> * In some locales strcoll() can claim that >> nonidentical strings are >> * equal. Believing that would be bad news for a >> number of reasons, >> * so we follow Perl's lead and sort "equal" strings >> according to >> * strcmp(). >> */ >> if (result == 0) >> result = strcmp(a1p, a2p); > > That seems odd and inefficient. Why would it be necessary? I would think > indexing (and other collation-sensitive operations) don't care what the > actual collation result is for arbitrary blobs of strings, as long as > they are stable? > This is the behavior since quite some time introduced by this commit http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=656beff59033ccc5261a615802e1a85da68e8fad -- Amit Langote -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general