On Mon, Aug 8, 2016 at 12:55 PM, Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:
On 08/08/2016 09:47 AM, Ioana Danes wrote:
On Mon, Aug 8, 2016 at 12:37 PM, Adrian Klaver
<adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@aklaver.com >> wrote:
On 08/08/2016 09:28 AM, Ioana Danes wrote:
On Mon, Aug 8, 2016 at 12:19 PM, Adrian Klaver
<adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@aklaver.com >
<mailto:adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com >>> wrote:
On 08/08/2016 09:11 AM, Ioana Danes wrote:
Hi,
I suspect I am having a case of data corruption. Here
are the
details:
I am running postgres 9.4.8:
postgresql94-9.4.8-1PGDG.rhel7.x86_64
postgresql94-contrib-9.4.8-1PGDG.rhel7.x86_64
postgresql94-libs-9.4.8-1PGDG.rhel7.x86_64
postgresql94-server-9.4.8-1PGDG.rhel7.x86_64
on CentOS Linux release 7.2.1511 (Core)
This is happening in a production environment but
luckily on the
reporting database.
I have a cluster of 3 databases, db1 and db2 are masters and
replicate
between each other and also replicate to db3 (db1 <->
db2, db1
-> db3,
db2 -> db3).
For replication I am using Bucardo.
I would say this is more a question for the Burcardo list:
https://mail.endcrypt.com/mailman/listinfo/bucardo-general
<https://mail.endcrypt.com/mailman/listinfo/bucardo-general >
<https://mail.endcrypt.com/mailman/listinfo/bucardo-general
<https://mail.endcrypt.com/mailman/listinfo/bucardo-general >>
I am just not seeing that replicating two masters on to a single
database is going to end well.
Only one master is active at one time the other one is in stand
by that
is a topic for another discussion but in our case that works well.
That was my first assumption, that it is a kind of a race
condition or a
bug on replication but I quickly ruled that out because that
does not
explain why when I filtered the table by transactionid = 75315815 it
shows one record with transactionid 75315811...
select gameplayid, transactionid, encodedplay from
abrazo.matchgameplay
where transactionid in (75315815) order by transactionid;;
gameplayid | transactionid | encodedplay
------------+---------------+--------------
160019271 | 75315815 | mix:9,0,9
160019269 | 75315815 | mix:9,8,9
160019267 | 75315815 | mix:9,2,2
160019265 | 75315815 | mix:2,2,8
160019263 | *75315811 *| backup:1,9,1
160019261 | 75315815 | backup:2,0,9
So I don't think it is a replication issue...
Other that, if I am following correctly, it is on the database(db3)
being replicated to. The only way db3 is getting its data is through
replication, is that correct?. On the master databases the data is
correct.
OK, let's assume that what you're saying is correct and the replication
has a bug, or corruption or whatever that is and the record gets created
with transactionid = 75315811. Bucardo replication is trigger based and
it is using a copy command to insert the new records into the replicated
database.
Then how can I explain that my query select gameplayid, transactionid,
encodedplay from abrazo.matchgameplay where transactionid in (75315815)
order by transactionid; returns me a record with transactionid 75315811???
Corrupted index on db3?
yes
Might want to look in the db3 logs to see if anything pops out.
I checked the logs, no traces of errors or corruption.
I just do not know enough about Burcardo to be of much help beyond that.
it is trigger based, it saves the ids of the inserted record in a delta table and then on sync it creates copy commands to the slave. Even if there is a bug or corruption in that process I don't see how that corrupts the index on db3...
So it is either replication bug + index corruption on db3 or data corruption on db3...
In response to Melvin, the query returns no rows:
SELECT n.nspname,
i.relname,
i.indexrelname,
CASE WHEN idx.indisprimary
THEN 'pkey'
WHEN idx.indisunique
THEN 'uidx'
ELSE 'idx'
END AS type,
'INVALID'
FROM pg_stat_all_indexes i
JOIN pg_class c ON (c.oid = i.relid)
JOIN pg_namespace n ON (n.oid = c.relnamespace)
JOIN pg_index idx ON (idx.indexrelid = i.indexrelid )
WHERE idx.indisvalid = FALSE
ORDER BY 1, 2;
nspname | relname | indexrelname | type | ?column?
---------+---------+--------------+------+----------
(0 rows)
SELECT n.nspname,
i.relname,
i.indexrelname,
CASE WHEN idx.indisprimary
THEN 'pkey'
WHEN idx.indisunique
THEN 'uidx'
ELSE 'idx'
END AS type,
'INVALID'
FROM pg_stat_all_indexes i
JOIN pg_class c ON (c.oid = i.relid)
JOIN pg_namespace n ON (n.oid = c.relnamespace)
JOIN pg_index idx ON (idx.indexrelid = i.indexrelid )
WHERE idx.indisvalid = FALSE
ORDER BY 1, 2;
nspname | relname | indexrelname | type | ?column?
---------+---------+--------------+------+----------
(0 rows)
Thank you for your thoughts,
ioana
Thanks,
ioana
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@aklaver.com >
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx