Duplicate Values or Not?!

John Seberg <johnseberg@xxxxxxxxx> · Fri, 16 Sep 2005 13:24:25 -0700 (PDT)

I recently tried to CREATE a UNIQUE INDEX and could
not, due to duplicate values:

CREATE UNIQUE INDEX usr_login ON usr (login);

To try to find the offending row(s), I then executed
the following:

SELECT count(*), login FROM usr GROUP BY login ORDER
BY 1 DESC;

The GROUP BY didn't group anything, indicating to me
that there were no duplicate values. There were the
same number of rows in this query as a simple SELECT
count(*) FROM usr.

This tells me that Postgresql is not using the same
method for determining duplicates when GROUPING and
INDEXing.

I dig a little deeper. The best candidate I find for a
possible duplicate are caused by characters that did
not translate well. IIRC, the basis was the name Pena,
which looked like Pe?a. I'm thinking the original data
was not encoded properly, or my export didn't handle
encodings properly, etc. The two Penas used different
characters in the 3rd position, neither of which were
translated correctly.

I loaded up data from another database vendor (4th
Dimension), into a 8.0.3 Postgresql, which I had
compiled from source with the default configuration.
This was on Yellow Dog Linux 4.0.1.

I brought the same data into a 8.0.1 on Max OS X
(binary from entropy.ch) and did NOT have this UNIQUE
INDEX failure. 

I'm sure my problems are deeper than the INDEX
failure, involving the accuracy of the conversion,
but, short term, I would like to know what is
different? They both are SQL_ASCII databases. I tried
importing into a UNICODE database, but that really a
mess of errors (during COPY).

I realize I need to learn about encodings, my source
data, etc., but I'm looking for hints. Anybody
experienced in exported 4th Dimension data containing
a certain amount of foriegn language text?

Thanks,

__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq