Re: Inexplicable duplicate rows with unique constraint

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Thu, 16 Jan 2020 09:27:25 -0800

On 1/16/20 9:24 AM, Richard van der Hoff wrote:

On 16/01/2020 17:12, Magnus Hagander wrote:
On Thu, Jan 16, 2020 at 6:08 PM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:

Richard van der Hoff <richard@xxxxxxxxxx> writes:
I'm trying to track down the cause of some duplicate rows in a table
which I would expect to be impossible due to a unique constraint. I'm
hoping that somebody here will be able to suggest something I might 
have
missed.

Since these are text columns, one possibility you should be looking into
is that the indexes have become corrupt due to a change in the operating
system's sorting rules for the underlying locale.  I don't recall 
details
at the moment, but I do remember that a recent glibc update changed the
sorting rules for some popular locale settings.  If an installation had
applied such an update underneath an existing database, you'd have a
situation where existing entries in an index are not in-order according
to the new behavior of the text comparison operators, leading to havoc
because btree searching relies on the entries being correctly sorted.

See https://wiki.postgresql.org/wiki/Locale_data_changes for hints on
which linux distros updated when.

Right, thanks to all who have suggested this.

It seems like a plausible explanation but it's worth noting that all the 
indexed data here is (despite being in text columns), plain ascii. I'm 
surprised that a change in collation rules would change the sorting of 
such strings, and hence that it could lead to this problem. Am I naive?

In psql who does:

\l the_database_name

show?

To answer Adrian's question: the lengths of the values in the indexed 
columns are identical between the duplicated rows.

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx