Re: changelog deadlock replication failures with DNA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Mahadevan,

I think you hit a new bug that I was able to reproduce.

The problem is an incorrect handling of operation return code when there is an error in SLAPI_PLUGIN_BE_TXN_POST_ADD_FN.
When the operation succeeds in the DB, DS (as a master) updates the CL and RUV in SLAPI_PLUGIN_BE_TXN_POST_ADD_FN phase.
At this moment the operation is successful. However the update of CL/RUV may fail (your case). It correctly aborts the update of the CL AND the txn related to the operation in the DB (like if the operation never happened)... but it looks like there is a bug where it keeps the operation returned code.
The operation return code should be a failure, but it is successful.

I reproduced the problem with a debug version where I simulate a DB_DEADLOCK failure
[17/Jun/2013:11:07:21 +0200] - slapd started.  Listening on All Interfaces port 24502 for LDAP requests
[17/Jun/2013:11:08:23 +0200] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (51bed207000000010000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
[17/Jun/2013:11:08:23 +0200] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for uid=td,dc=com (uniqid: 74589a01-d72d11e2-b4c2bdbf-b0a22af5, optype: 16) to changelog csn 51bed207000000010000


[tbordaz@pctbordaz userRoot]$ tail -20 /var/log/dirsrv/slapd-master_cl/access
...
[17/Jun/2013:11:08:23 +0200] conn=2 op=1 ADD dn="uid=td,dc=com"
[17/Jun/2013:11:08:23 +0200] conn=2 op=1 RESULT err=0 tag=105 nentries=0 etime=0 csn=51bed207000000010000

I do not know why your environment is prone to trigger db_deadlock (lot of replica agreements, VM, slow disks...).
I think the best way to progress is that you fill a ticket/bug so that we may track the issue. Note this bug is possibly affecting all operations (ADD/MOD/MODRDN/DEL)

best regards
thierry
On 06/15/2013 01:03 AM, Mahadevan, Venkat wrote:

> If the operation fails to write into the changelog, the operation fails. In your case, it means that the ldapclient should receive an error.
> So it is like if the operation never happened and is not replicated.

Hi Thierry,

 

That’s what I would expect to, but that does not seem to be the case. In my access log on the master server below,

the client successfully adds the entry to the master server and receives an error code return of 0:

 

[12/Jun/2013:13:36:59 -0700] conn=19048 op=0 BIND dn="cn=Directory Manager" method=128 version=3

[12/Jun/2013:13:36:59 -0700] conn=19048 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="cn=directory manager"

[12/Jun/2013:13:36:59 -0700] conn=19048 op=1 ADD dn="uid=jmeter325,dc=tst,dc=id,dc=ubc,dc=ca"

[12/Jun/2013:13:37:04 -0700] conn=19048 op=1 RESULT err=0 tag=105 nentries=0 etime=5 csn=51b8dbec002b02bd0000

 

In the error log at the same time the above happens:

 

[12/Jun/2013:13:37:02 -0700] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica dc=tst,dc=id,dc=ubc,dc=ca: LDAP error - 51

[12/Jun/2013:13:37:04 -0700] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=51b8dbec002b02bd0000) failed (rc=-30994 (DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))

[12/Jun/2013:13:37:04 -0700] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (51b8dbec002b02bd0000); db error - -30994 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock

[12/Jun/2013:13:37:04 -0700] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for uid=jmeter325,dc=tst,dc=id,dc=ubc,dc=ca (uniqid: c0a7827c-d39f11e2-879add5e-44f44922, optype: 16) to changelog csn 51b8dbec002b02bd0000

 

The net result is that the entry is successfully added to the master server but will never replication to any of the consumers.

 

cheers,

 

VM


--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users

[Index of Archives]     [Fedora User Discussion]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [Fedora News]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Maintainers]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Legacy]     [Fedora Desktop]     [Fedora Fonts]     [ATA RAID]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Centos]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora QA]     [Fedora Triage]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Tux]     [Yosemite News]     [Yosemite Photos]     [Linux Apps]     [Maemo Users]     [Gnome Users]     [KDE Users]     [Fedora Tools]     [Fedora Art]     [Fedora Docs]     [Maemo Users]     [Asterisk PBX]     [Fedora Sparc]     [Fedora Universal Network Connector]     [Fedora ARM]

  Powered by Linux