Hello postgres experts,
We are running a test that periodically abruptly kills postgres
process(equivalent to kill -9) and restarts it.
After running this test for 24 hrs or so, we see duplicate primary key
entries in postgres table.
We detect this as we load internal hash-table data-structure in a
separate process with primary key entries.
Before hitting this issue we see following warning messages in pg_log
17365 2015-03-24 03:01:42.729 GMTWARNING: page is not marked
all-visible but visibility map bit is set in relation "table_foo" page 12
17365 2015-03-24 03:01:42.729 GMTWARNING: page is not marked
all-visible but visibility map bit is set in relation "table_foo" page 13
Some information about schema.
- This table can contain upto 150k entries.
- *IMPORTANT*: We constantly insert new entries and remove older entries
from the table.
Relevant columns in table_foo
-----------------------------------------------------------------------------
pk_col3 | bigint | not null default 0::bigint
pk_col1 | bigint | not null default 0::bigint
pk_col2 | bigint | not null default 0::bigint
"table_foo_pkey" PRIMARY KEY, btree (pk_col1, pk_col2, pk_col3)
There are 3 other indexes on non-primary key columns in the table.
Duplicate entries
db=# select pk_col1, pk_col2, pk_col3, count(1) from table_foo group by
pk_col1, pk_col2, pk_col3 having count(1) > 1;
pk_col1 | pk_col2| pk_col3 | count
--------------------+--------+----------+-------
627708949163497688 | 1 | 13467 | 2
627708949163497688 | 4 | 13566 | 2
627708949163497688 | 266 | 13565 | 2
(3 rows)
Query analyzer using index only scan.
sodb=# explain select pk_col1, pk_col2, pk_col3, count(1) from table_foo
group by pk_col1, pk_col2, pk_col3 having count(1) > 1 order by pk_col3;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Sort (cost=166.25..167.97 rows=689 width=24)
Sort Key: pk_col3
-> HashAggregate (cost=125.16..133.77 rows=689 width=24)
Filter: (count(1) > 1)
-> Index Only Scan using table_foo_pkey on table_foo (cost=0.00..113.36
rows=944 width=24)
(5 rows)
When non-primary key column is queried we don't get duplicate entries.
Query analyzer is using sequential scan on table_foo table.
sodb=# select pk_col1, pk_col2, pk_col3, creation_time, count(1) from
table_foo group by pk_col1, pk_col2, pk_col3 having count(1) > 1 order
by pk_col3;
pk_col1 | pk_col2 | pk_col3 | creation_time | count
-------------------------------------------
(0 rows)
sodb=# explain select pk_col1, pk_col2, pk_col3, creation_time, count(1)
from table_foo group by pk_col1, pk_col2, pk_col3 having count(1) > 1
order by pk_col3;
QUERY PLAN
--------------------------------------------------------------------------
Sort (cost=174.33..176.06 rows=689 width=32)
Sort Key: pk_col3
-> HashAggregate (cost=133.24..141.85 rows=689 width=32)
Filter: (count(1) > 1)
-> Seq Scan on table_foo (cost=0.00..121.44 rows=944 width=32)
(5 rows)
We ran an experiment wherein we reindex the offending table on every
postgres startup and we don't see the same issue after reindex.
This leads us to believe that the index is corrupted but actual data on
the table is fine.
Some information about postgres setup.
- 9.2.0
- We use standard configuration with shared_buffer setting as 32MB and
checkpoint_timeout as 1 min.
- In this particular case postgres replication is not enabled.
Let me know if more information is needed to help understand this issue.
Any help or pointers will be appreciated.
Thanks,
Bankim.
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general