On 10/24/2015 12:35 PM, Chaz Yoon wrote:
I am seeing a duplicate, stale copy of the same row when performing a pg_dump or copying a specific table, but not when directly selecting from it. I'm running PostgreSQL 9.3.9 on Amazon RDS, with 9.3.10 client tools. It's happening on a users table, which has a primary key and enforces a unique email address: Table "public.users" Column | Type | Modifiers ---------------+-----------------------------+--------------------------------------------------- id | integer | not null default nextval('users_id_seq'::regclass) email | character varying(255) | not null default ''::character varying last_activity | timestamp without time zone | Indexes: "users_pkey" PRIMARY KEY, btree (id) "users_unique_email" UNIQUE, btree (email) I first noticed the problem when doing copying the table to another database. Roughly this: % pg_dump --column-inserts -Fp -h remotedb remote_db > users.txt % psql test_db < users.txt [...] ERROR: could not create unique index "users_pkey" DETAIL: Key (id)=(123) is duplicated. [...] ERROR: could not create unique index "users_unique_email" DETAIL: Key (email)=(this_user@xxxxxxx <mailto:this_user@xxxxxxx>) is duplicated. It appears there's some sort of duplicate record for a single user in the database. Checking the pg_dump output, I saw that a single user's record was being exported twice: % grep -i this_user@xxxxxxx <mailto:this_user@xxxxxxx> users.txt INSERT INTO users (id, email, last_activity) VALUES (123, 'this_user@xxxxxxx <mailto:this_user@xxxxxxx>', '2015-10-21 10:32:15.997887'); INSERT INTO users (id, email, last_activity) VALUES (123, 'this_user@xxxxxxx <mailto:this_user@xxxxxxx>', '2015-10-02 11:32:58.615743'); The rows were not exactly the same. Connecting to the source database directly, I tried this: remote_db=> select count(1) from users where id = 123; count ------- 1 (1 row) remote_db=> select count(1) from users where email = 'this_user@xxxxxxx <mailto:this_user@xxxxxxx>'; count ------- 1 (1 row) To eliminate any risk of it being a weird locking issue, I restored a snapshot of the database into a new RDS instance but I got the same results. I then tried the following: remote_db=> create table users_copy_with_indexes (like users including defaults including constraints including indexes including storage including comments); CREATE TABLE remote_db=> insert into users_copy_with_indexes select * from users; ERROR: duplicate key value violates unique constraint "users_copy_with_indexes_pkey" DETAIL: Key (id)=(123) already exists. However, when I created a copy without the indexes, I can see the duplicate rows: remote_db=> create table users_copy_without_indexes (like users); CREATE TABLE remote_db=> insert into users_copy_without_indexes select * from users; INSERT 0 523342 remote_db=> select count(1) from users_copy_without_indexes where id = 123; count ------- 2 (1 row) remote_db=> select count(1) from users_copy_without_indexes where email = 'this_user@xxxxxxx <mailto:this_user@xxxxxxx>'; count ------- 2 (1 row) Any suggestions for what to look for next? Is it table corruption?
I would say the smoking gun is the copy w/o indexes shows both records and the one with indexes only one. I would DROP/CREATE index on the original table, with the usual caveat that this does place a load on the table. Using Concurrently might help, but I would read the information here:
http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY Building Indexes Concurrently
Chaz
-- Adrian Klaver adrian.klaver@xxxxxxxxxxx -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general