On Mon, Aug 8, 2016 at 12:47 PM, Ioana Danes <ioanadanes@xxxxxxxxx> wrote:
On Mon, Aug 8, 2016 at 12:37 PM, Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:On 08/08/2016 09:28 AM, Ioana Danes wrote:
On Mon, Aug 8, 2016 at 12:19 PM, Adrian Klaver
<adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@aklaver.com >> wrote:
On 08/08/2016 09:11 AM, Ioana Danes wrote:
Hi,
I suspect I am having a case of data corruption. Here are the
details:
I am running postgres 9.4.8:
postgresql94-9.4.8-1PGDG.rhel7.x86_64
postgresql94-contrib-9.4.8-1PGDG.rhel7.x86_64
postgresql94-libs-9.4.8-1PGDG.rhel7.x86_64
postgresql94-server-9.4.8-1PGDG.rhel7.x86_64
on CentOS Linux release 7.2.1511 (Core)
This is happening in a production environment but luckily on the
reporting database.
I have a cluster of 3 databases, db1 and db2 are masters and
replicate
between each other and also replicate to db3 (db1 <-> db2, db1
-> db3,
db2 -> db3).
For replication I am using Bucardo.
I would say this is more a question for the Burcardo list:
https://mail.endcrypt.com/mailman/listinfo/bucardo-general
<https://mail.endcrypt.com/mailman/listinfo/bucardo-general >
I am just not seeing that replicating two masters on to a single
database is going to end well.
Only one master is active at one time the other one is in stand by that
is a topic for another discussion but in our case that works well.
That was my first assumption, that it is a kind of a race condition or a
bug on replication but I quickly ruled that out because that does not
explain why when I filtered the table by transactionid = 75315815 it
shows one record with transactionid 75315811...
select gameplayid, transactionid, encodedplay from abrazo.matchgameplay
where transactionid in (75315815) order by transactionid;;
gameplayid | transactionid | encodedplay
------------+---------------+--------------
160019271 | 75315815 | mix:9,0,9
160019269 | 75315815 | mix:9,8,9
160019267 | 75315815 | mix:9,2,2
160019265 | 75315815 | mix:2,2,8
160019263 | *75315811 *| backup:1,9,1
160019261 | 75315815 | backup:2,0,9
So I don't think it is a replication issue...
Other that, if I am following correctly, it is on the database(db3) being replicated to. The only way db3 is getting its data is through replication, is that correct?. On the master databases the data is correct.
OK, let's assume that what you're saying is correct and the replication has a bug, or corruption or whatever that is and the record gets created with transactionid = 75315811. Bucardo replication is trigger based and it is using a copy command to insert the new records into the replicated database.
Then how can I explain that my query select gameplayid, transactionid, encodedplay from abrazo.matchgameplay where transactionid in (75315815) order by transactionid; returns me a record with transactionid 75315811???Thanks,ioana
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx
You did provide the full table structure, but this might possibly be a case of index corruption.
Try executing the following query and if any show up as invalid, just drop and rebuild.
SELECT n.nspname,
i.relname,
i.indexrelname,
CASE WHEN idx.indisprimary
THEN 'pkey'
WHEN idx.indisunique
THEN 'uidx'
ELSE 'idx'
END AS type,
'INVALID'
FROM pg_stat_all_indexes i
JOIN pg_class c ON (c.oid = i.relid)
JOIN pg_namespace n ON (n.oid = c.relnamespace)
JOIN pg_index idx ON (idx.indexrelid = i.indexrelid )
WHERE idx.indisvalid = FALSE
ORDER BY 1, 2;
Try executing the following query and if any show up as invalid, just drop and rebuild.
SELECT n.nspname,
i.relname,
i.indexrelname,
CASE WHEN idx.indisprimary
THEN 'pkey'
WHEN idx.indisunique
THEN 'uidx'
ELSE 'idx'
END AS type,
'INVALID'
FROM pg_stat_all_indexes i
JOIN pg_class c ON (c.oid = i.relid)
JOIN pg_namespace n ON (n.oid = c.relnamespace)
JOIN pg_index idx ON (idx.indexrelid = i.indexrelid )
WHERE idx.indisvalid = FALSE
ORDER BY 1, 2;
--
Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.