Hi,
the fixes for the tickets you mention did change the iteration thru
the changelog and how it handles situtations when the start csn is
not found in the changelog. and it also did change the logging, so
you might see messages now which were not there or hidden before.
But I am very surprised to see them so frequently and I would like
to understand it.
First some questions, do you have changelog trimming enabled and
how, do you have fractional replication ?
Next, is it possible to get the access and error logs for a period
of an hour from all servers (you can send them off list) ? I would
like to track some of the reported csns.
Regards,
Ludwig
On 09/06/2016 12:31 PM, Ivanov Andrey
(M.) wrote:
Hi,
We are successfully using the compiled 1.3.4 git branch of
389DS in production on CentOS 7 since about a year
(approximately 40 000 entries, about 4000 groups, hundreds
of reads and tens of writes per second).
Our current topology consists of 3 servers in triangle
(each server is a master replicating to 2 others, so two
read-write replication agreements on each).
Since the fixes for the Ticket 48766 ("Replication
changelog can incorrectly skip over updates") and Ticket
48954 ("Replication fails because anchorcsn cannot be
found") I’ve started to see the following regular warnings
in error logs:
[06/Sep/2016:01:21:43 +0200] clcache_load_buffer_bulk -
changelog record with csn (57cdfe06000100010000) not found
for DB_NEXT
[06/Sep/2016:01:21:43 +0200] agmt="cn=Replication from
ldap-adm.<domain> to ldap-lab.<domain>"
(ldap-lab:636) - Can't locate CSN 57cdfe06000100010000 in
the changelog (DB rc=-30988). If replication stops, the
consumer may need to be reinitialized.
[06/Sep/2016:02:35:25 +0200] - replica_generate_next_csn:
opcsn=57ce0f4e000500020000 <=
basecsn=57ce0f4e000500030000, adjusted
opcsn=57ce0f4e000600020000
[06/Sep/2016:04:10:11 +0200] clcache_load_buffer_bulk -
changelog record with csn (57ce257e000400030000) not found
for DB_NEXT
[06/Sep/2016:05:16:58 +0200] - replica_generate_next_csn:
opcsn=57ce352b000000020000 <=
basecsn=57ce352b000100010000, adjusted
opcsn=57ce352b000100020000
[06/Sep/2016:06:56:04 +0200] agmt="cn=Replication from
ldap-adm.<domain> to ldap-ens.<domain>"
(ldap-ens:636) - Can't locate CSN 57ce4c62000100030000 in
the changelog (DB rc=-30988). If replication stops, the
consumer may need to be reinitialized.
[06/Sep/2016:07:29:00 +0200] agmt="cn=Replication from
ldap-adm.<domain> to ldap-ens.<domain>"
(ldap-ens:636) - Can't locate CSN 57ce541a000200030000 in
the changelog (DB rc=-30988). If replication stops, the
consumer may need to be reinitialized.
[06/Sep/2016:07:34:20 +0200] agmt="cn=Replication from
ldap-adm.<domain> to ldap-lab.<domain>"
(ldap-lab:636) - Can't locate CSN 57ce5559000100010000 in
the changelog (DB rc=-30988). If replication stops, the
consumer may need to be reinitialized.
[06/Sep/2016:07:34:27 +0200] agmt="cn=Replication from
ldap-adm.<domain> to ldap-lab.<domain>"
(ldap-lab:636) - Can't locate CSN 57ce5561000000010000 in
the changelog (DB rc=-30988). If replication stops, the
consumer may need to be reinitialized.
[06/Sep/2016:07:40:17 +0200] clcache_load_buffer_bulk -
changelog record with csn (57ce56c0000500030000) not found
for DB_NEXT
[06/Sep/2016:07:40:24 +0200] clcache_load_buffer_bulk -
changelog record with csn (57ce56c5000100030000) not found
for DB_NEXT
[06/Sep/2016:08:08:36 +0200] clcache_load_buffer_bulk -
changelog record with csn (57ce5d5f000f00010000) not found
for DB_NEXT
[06/Sep/2016:08:12:39 +0200] clcache_load_buffer_bulk -
changelog record with csn (57ce5e54000200030000) not found
for DB_NEXT
[06/Sep/2016:08:12:39 +0200] agmt="cn=Replication from
ldap-adm.<domain> to ldap-ens.<domain>"
(ldap-ens:636) - Can't locate CSN 57ce5e54000200030000 in
the changelog (DB rc=-30988). If replication stops, the
consumer may need to be reinitialized.
[06/Sep/2016:08:26:45 +0200] clcache_load_buffer_bulk -
changelog record with csn (57ce61a3000200030000) not found
for DB_NEXT
[06/Sep/2016:08:27:40 +0200] clcache_load_buffer_bulk -
changelog record with csn (57ce61d8000200030000) not found
for DB_NEXT
[06/Sep/2016:08:27:40 +0200] agmt="cn=Replication from
ldap-adm.<domain> to ldap-ens.<domain>"
(ldap-ens:636) - Can't locate CSN 57ce61d8000200030000 in
the changelog (DB rc=-30988). If replication stops, the
consumer may need to be reinitialized.
[06/Sep/2016:08:31:42 +0200] clcache_load_buffer_bulk -
changelog record with csn (57ce62c8000300010000) not found
for DB_NEXT
[06/Sep/2016:08:34:05 +0200] clcache_load_buffer_bulk -
changelog record with csn (57ce635a000100010000) not found
for DB_NEXT
[06/Sep/2016:08:44:28 +0200] clcache_load_buffer_bulk -
changelog record with csn (57ce65c9000200030000) not found
for DB_NEXT
[06/Sep/2016:08:52:25 +0200] agmt="cn=Replication from
ldap-adm.<domain> to ldap-ens.<domain>"
(ldap-ens:636) - Can't locate CSN 57ce67aa000100030000 in
the changelog (DB rc=-30988). If replication stops, the
consumer may need to be reinitialized.
[06/Sep/2016:08:53:04 +0200] - replica_generate_next_csn:
opcsn=57ce67d1000100020000 <=
basecsn=57ce67d1000200030000, adjusted
opcsn=57ce67d1000200020000
These warnings are present
on all three servers and for all replication agreements. One
of them is virtual and two others are physical.
The replication still seems to work fine in spite of
these warnings. The "replica_generate_next_csn" is not new
- it existed since always with 1.3.4, the two new warnings
are "clcache_load_buffer_bulk " and "Can't locate CSN ...
in the changelog (DB rc=-30988)." There are no network
problems or anything like that. So it could only be
replication topology (3-master fully-connected triangle)
and/or servers being rather busy. Is it a bug, a warning
that can be ignored or anything else?
Thank you!
--
389-users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://lists.fedoraproject.org/admin/lists/389-users@xxxxxxxxxxxxxxxxxxxxxxx
--
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric Shander
|