Re: 389DS v1.3.4.x after fixes for tickets 48766 and 48954

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 09/07/2016 08:55 AM, Ludwig Krispenz wrote:

On 09/06/2016 02:02 PM, Ivanov Andrey (M.) wrote:
Hi Ludwig,



the fixes for the tickets you mention did change the iteration thru the changelog and how it handles situtations when the start csn is not found in the changelog. and it also did change the logging, so you might see messages now which were not there or hidden before.
That was my understanding too.
so far I have not seen any replication problems related to these messages, all generatedcsns seem to be replicated. What makes it a bit more difficult is that most of the updates are updates of lastlogintime and the original MOD is not logged. I still do not understand why we have these messages so frequently, I will try to reproduce.
Or, if it possible, could you run the servers for just an hour with replication logging enabled ?
no more need for this, I found the messages in a deployment where repl logging was enabled. I think it happens when the smallest consumer maxCSN is ahead of the local maxCSN for this replicaID.
It should do no harm, but in some scenarios could slow down replication a bit.
I will continue to investigate and work on a fix

When looking into the provided data set I did notice three replicated ops with err=50, insufficient access. This should not happen and requires a separate investigation


But I am very surprised to see them so frequently and I would like to understand it.
First some questions, do you have changelog trimming enabled and how, do you have fractional replication ?
yes for both questions.

Trimming: 14 days
Fractional replication:
nsDS5ReplicatedAttributeList: (objectclass=*) $ EXCLUDE entryusn memberOf
nsDS5ReplicatedAttributeListTotal: (objectclass=*) $ EXCLUDE entryusn
nsds5ReplicaStripAttrs: modifiersName modifyTimestamp internalModifiersName internalModifyTimestamp internalCreatorsname

Changelog:
cn=changelog5,cn=config
objectClass: top
objectClass: extensibleObject
cn: changelog5
nsslapd-changelogdir: /Local/dirsrv/var/lib/dirsrv/slapd-ens/changelogdb
nsslapd-changelogmaxage: 14d


replica:
cn=replica,cn=dc\\3Did\\2Cdc\\3Dpolytechnique\\2Cdc\\3Dedu,cn=mapping tree,cn=config
objectClass: top
objectClass: nsDS5Replica
cn: replica
nsDS5ReplicaId: 1
nsDS5ReplicaRoot: dc=id,dc=polytechnique,dc=edu
nsDS5Flags: 1
nsDS5ReplicaBindDN: cn=RepliX,cn=config
nsds5ReplicaPurgeDelay: 604800
nsds5ReplicaTombstonePurgeInterval: 86400
nsds5ReplicaLegacyConsumer: False
nsDS5ReplicaType: 3
nsState:: AQAAAAAAAADCrc5XAAAAAAAAAAAAAAAAAQAAAAAAAAABAAAAAAAAAA==
nsDS5ReplicaName: eeb6d304-736c11e6-9bc5a1ff-40280b8e
nsds5ReplicaChangeCount: 114948
nsds5replicareapactive: 0


Typical replication agreement:

cn=Replication from ldap-lab.<domain name> to ldap-adm.<domain name>,cn=replica,cn=dc\\3Did\\2Cdc\\3Dpolytechnique\\2Cdc\\3Dedu,cn=mapping tree,cn=config
objectClass: top
objectClass: nsDS5ReplicationAgreement
cn: Replication from ldap-lab.<domain name> to ldap-adm.<domain name>
description: Replication agreement from server ldap-lab.<domain name> to server ldap-adm.<domain name>
nsDS5ReplicaHost: ldap-adm.<domain name>
nsDS5ReplicaRoot: dc=id,dc=polytechnique,dc=edu
nsDS5ReplicaPort: 636
nsDS5ReplicaTransportInfo: SSL
nsDS5ReplicaBindDN: cn=RepliX,cn=config
nsDS5ReplicaBindMethod: simple
nsDS5ReplicatedAttributeList: (objectclass=*) $ EXCLUDE entryusn memberOf
nsDS5ReplicatedAttributeListTotal: (objectclass=*) $ EXCLUDE entryusn
nsds5ReplicaStripAttrs: modifiersName modifyTimestamp internalModifiersName internalModifyTimestamp internalCreatorsname
nsds5replicaBusyWaitTime: 5
nsds5ReplicaFlowControlPause: 500
nsds5ReplicaFlowControlWindow: 1000
nsds5replicaTimeout: 120
nsDS5ReplicaCredentials: {AES-...
nsds50ruv: {replicageneration} 57cd7377000000020000
nsds50ruv: {replica 2 ldap://ldap-adm.<domain name>:389}
nsruvReplicaLastModified: {replica 2 ldap://ldap-adm.<domain name>:389} 00000000
nsds5replicareapactive: 0
nsds5replicaLastUpdateStart: 20160906115520Z
nsds5replicaLastUpdateEnd: 20160906115520Z
nsds5replicaChangesSentSinceStartup: 3:13525/670 1:3671/0 2:1/0
nsds5replicaLastUpdateStatus: 0 Replica acquired successfully: Incremental update succeeded
nsds5replicaUpdateInProgress: FALSE
nsds5replicaLastInitStart: 19700101000000Z
nsds5replicaLastInitEnd: 19700101000000Z



Next, is it possible to get the access and error logs for a period of an hour from all servers (you can send them off list) ? I would like to track some of the reported csns.
Sure, i will send it to you off list in a moment.

Thank you,

Regards,
Andrey



Regards,
Ludwig


On 09/06/2016 12:31 PM, Ivanov Andrey (M.) wrote:
Hi,

We are successfully using the compiled 1.3.4 git branch of 389DS in production on CentOS 7 since about a year (approximately 40 000 entries, about 4000 groups, hundreds of reads and tens of writes per second).
Our current topology consists of 3 servers in triangle (each server is a master replicating to 2 others, so two read-write replication agreements on each).

Since the fixes for the Ticket 48766 ("Replication changelog can incorrectly skip over updates") and Ticket 48954 ("Replication fails because anchorcsn cannot be found") I’ve started to see the following regular warnings in error logs:

[06/Sep/2016:01:21:43 +0200] clcache_load_buffer_bulk - changelog record with csn (57cdfe06000100010000) not found for DB_NEXT
[06/Sep/2016:01:21:43 +0200] agmt="cn=Replication from ldap-adm.<domain> to ldap-lab.<domain>" (ldap-lab:636) - Can't locate CSN 57cdfe06000100010000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized.
[06/Sep/2016:02:35:25 +0200] - replica_generate_next_csn: opcsn=57ce0f4e000500020000 <= basecsn=57ce0f4e000500030000, adjusted opcsn=57ce0f4e000600020000
[06/Sep/2016:04:10:11 +0200] clcache_load_buffer_bulk - changelog record with csn (57ce257e000400030000) not found for DB_NEXT
[06/Sep/2016:05:16:58 +0200] - replica_generate_next_csn: opcsn=57ce352b000000020000 <= basecsn=57ce352b000100010000, adjusted opcsn=57ce352b000100020000
[06/Sep/2016:06:56:04 +0200] agmt="cn=Replication from ldap-adm.<domain> to ldap-ens.<domain>" (ldap-ens:636) - Can't locate CSN 57ce4c62000100030000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized.
[06/Sep/2016:07:29:00 +0200] agmt="cn=Replication from ldap-adm.<domain> to ldap-ens.<domain>" (ldap-ens:636) - Can't locate CSN 57ce541a000200030000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized.
[06/Sep/2016:07:34:20 +0200] agmt="cn=Replication from ldap-adm.<domain> to ldap-lab.<domain>" (ldap-lab:636) - Can't locate CSN 57ce5559000100010000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized.
[06/Sep/2016:07:34:27 +0200] agmt="cn=Replication from ldap-adm.<domain> to ldap-lab.<domain>" (ldap-lab:636) - Can't locate CSN 57ce5561000000010000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized.
[06/Sep/2016:07:40:17 +0200] clcache_load_buffer_bulk - changelog record with csn (57ce56c0000500030000) not found for DB_NEXT
[06/Sep/2016:07:40:24 +0200] clcache_load_buffer_bulk - changelog record with csn (57ce56c5000100030000) not found for DB_NEXT
[06/Sep/2016:08:08:36 +0200] clcache_load_buffer_bulk - changelog record with csn (57ce5d5f000f00010000) not found for DB_NEXT
[06/Sep/2016:08:12:39 +0200] clcache_load_buffer_bulk - changelog record with csn (57ce5e54000200030000) not found for DB_NEXT
[06/Sep/2016:08:12:39 +0200] agmt="cn=Replication from ldap-adm.<domain> to ldap-ens.<domain>" (ldap-ens:636) - Can't locate CSN 57ce5e54000200030000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized.
[06/Sep/2016:08:26:45 +0200] clcache_load_buffer_bulk - changelog record with csn (57ce61a3000200030000) not found for DB_NEXT
[06/Sep/2016:08:27:40 +0200] clcache_load_buffer_bulk - changelog record with csn (57ce61d8000200030000) not found for DB_NEXT
[06/Sep/2016:08:27:40 +0200] agmt="cn=Replication from ldap-adm.<domain> to ldap-ens.<domain>" (ldap-ens:636) - Can't locate CSN 57ce61d8000200030000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized.
[06/Sep/2016:08:31:42 +0200] clcache_load_buffer_bulk - changelog record with csn (57ce62c8000300010000) not found for DB_NEXT
[06/Sep/2016:08:34:05 +0200] clcache_load_buffer_bulk - changelog record with csn (57ce635a000100010000) not found for DB_NEXT
[06/Sep/2016:08:44:28 +0200] clcache_load_buffer_bulk - changelog record with csn (57ce65c9000200030000) not found for DB_NEXT
[06/Sep/2016:08:52:25 +0200] agmt="cn=Replication from ldap-adm.<domain> to ldap-ens.<domain>" (ldap-ens:636) - Can't locate CSN 57ce67aa000100030000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized.
[06/Sep/2016:08:53:04 +0200] - replica_generate_next_csn: opcsn=57ce67d1000100020000 <= basecsn=57ce67d1000200030000, adjusted opcsn=57ce67d1000200020000

These warnings are present on all three servers and for all replication agreements. One of them is virtual and two others are physical.

The replication still seems to work fine in spite of these warnings. The "replica_generate_next_csn" is not new - it existed since always with 1.3.4, the two new warnings are "clcache_load_buffer_bulk " and "Can't locate CSN ... in the changelog (DB rc=-30988)." There are no network problems or anything like that. So it could only be replication topology (3-master fully-connected triangle) and/or servers being rather busy. Is it a bug, a warning that can be ignored or anything else?


Thank you!



--
389-users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxxhttps://lists.fedoraproject.org/admin/lists/389-users@xxxxxxxxxxxxxxxxxxxxxxx

-- 
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn, 
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric Shander

--
389-users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://lists.fedoraproject.org/admin/lists/389-users@xxxxxxxxxxxxxxxxxxxxxxx


--
389-users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://lists.fedoraproject.org/admin/lists/389-users@xxxxxxxxxxxxxxxxxxxxxxx

-- 
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn, 
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric Shander


--
389-users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://lists.fedoraproject.org/admin/lists/389-users@xxxxxxxxxxxxxxxxxxxxxxx

-- 
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn, 
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric Shander
--
389-users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://lists.fedoraproject.org/admin/lists/389-users@xxxxxxxxxxxxxxxxxxxxxxx

[Index of Archives]     [Fedora User Discussion]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [Fedora News]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Maintainers]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Legacy]     [Fedora Desktop]     [Fedora Fonts]     [ATA RAID]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Centos]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora QA]     [Fedora Triage]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Tux]     [Yosemite News]     [Yosemite Photos]     [Linux Apps]     [Maemo Users]     [Gnome Users]     [KDE Users]     [Fedora Tools]     [Fedora Art]     [Fedora Docs]     [Maemo Users]     [Asterisk PBX]     [Fedora Sparc]     [Fedora Universal Network Connector]     [Fedora ARM]

  Powered by Linux