Re: [389-users] Data inconsitency during replication

Richard Megginson <rmeggins@xxxxxxxxxx> · Mon, 31 Oct 2011 19:04:19 -0400 (EDT)



----- Original Message -----
> 
> 
> 
> Hi Rich,
> 
> 
> 
> One correction in step-4 “recreation of “cn=replica” entry for the
> suffix. As per the example given below, suffix is “o=USA”
> 
> 
> 
> - Recreate the “cn=replica” entry for the suffix as below .
> 
> dn: cn=replica,cn=o=USA,cn=mapping tree,cn=config
> 
> changetype: add
> 
> objectClass: nsds5replica
> 
> objectClass: top
> 
> nsDS5ReplicaRoot: o=USA
> 
> nsDS5ReplicaType: 3
> 
> nsDS5Flags: 1
> 
> nsDS5ReplicaId: 10 -- à Please assign the same “nsDS5ReplicaId value
> what master was having. In my case, Original master replica ID was
> 10.
> 
> nsds5ReplicaPurgeDelay: 1
> 
> nsds5ReplicaTombstonePurgeInterval: -1
> 
> cn: replica
> 
> 
> 
> Regards,
> 
> Jyoti
> 
> 
> 
> 
> 
> From: Das, Jyoti Ranjan (STSD)
> Sent: Monday, October 31, 2011 2:38 PM
> To: 'Rich Megginson'; General discussion list for the 389 Directory
> server project.
> Subject: RE: [389-users] Data inconsitency during replication
> 
> 
> 
> Hi Rich,
> 
> 
> 
> Thanks a lot for your response. Please find the sample reproducer
> details below. I am not sure about how to log a bug. I will explore
> and do it.

https://bugzilla.redhat.com/enter_bug.cgi?product=389

Use category Replication - General

> 
> 
> 
> 
> 
> Reproducer:
> 
> 
> 
> 
> 
> Step-1:
> 
> 
> 
> Have a topology like Master replicating to Slave and Slave
> replication to consumer.
> 
> 
> 
> Master -> Slave-> Consumer.
> 
> 
> 
> Step-2:
> 
> Make sure that all are on sync at this time. Let’s take an example
> all are the on sync up to CSN5 (5 records are added to master from
> CSN1 to CSN5).
> 
> 
> 
> Step-3:
> 
> 
> 
> Delete the replication agreement from Master to Slave and also from
> Slave to consumer.
> 
> 
> 
> Step-4:
> 
> 
> 
> Promote the Slave to master. Promotion steps are given below.
> 
> 
> 
> - Delete Supplier DN (cn=suppdn,cn=config) from Slave
> 
> - Delete “cn=replica” entry for the suffix “o=USA” using ldapmodify.
> As a result, it will delete the changelog file.
> 
> Ex: dn: cn=replica,cn=o=USA,cn=mapping tree,cn=config
> 
> changetype: delete
> 
> - Modify the cn=o=USA ,cn=mapping tree,cn=config entry as below
> 
> EX: dn: cn=o=USA,cn=mapping tree,cn=config
> 
> changetype: modify
> 
> replace: nsslapd-state
> 
> nsslapd-state: backend
> 
> 
> 
> dn: cn=o=USA,cn=mapping tree,cn=config
> 
> changetype: modify
> 
> delete: nsslapd-referral
> 
> - Recreate the “cn=replica” entry for the suffix as below .
> 
> dn: cn=replica,cn=o=SWIFT,cn=mapping tree,cn=config
> 
> changetype: add
> 
> objectClass: nsds5replica
> 
> objectClass: top
> 
> nsDS5ReplicaRoot: o=SWIFT
> 
> nsDS5ReplicaType: 3
> 
> nsDS5Flags: 1
> 
> nsDS5ReplicaId: 10 -- à Please assign the same “nsDS5ReplicaId value
> what master was having. In my case, Original master replica ID was
> 10.
> 
> nsds5ReplicaPurgeDelay: 1
> 
> nsds5ReplicaTombstonePurgeInterval: -1
> 
> cn: replica
> 
> - Restart slapd process. Now Slave become Master.
> 
> 
> 
> Is there anything am I missing during promotion operation or it’s not
> the right way to do the promotion operation ?
> 
> 
> 
> Step -5:
> 
> 
> 
> Add the replication agreement between Slave(newly promoted Master)
> and Consumer . At this time both Slave and consumer are on sync up
> to CSN5. During agreement creation please do not initialize the
> consumer .
> 
> 
> 
> Slave(newly promoted as master) - > consumer.
> 
> 
> 
> Step-6:
> 
> 
> 
> Add another 5 more entries to Slave which was promoted above as
> Master. Let’s assume CSN numbers for these 5 entries are from CSN6
> to CSN10.
> 
> 
> 
> Step-7:
> 
> 
> 
> Now, you will see, among the last 5 entries only last few will gets
> replicated without halting the replication.
> 
> 
> 
> 
> 
> Regards,
> 
> Jyoti
> 
> 
> 
> 
> 
> 
> 
> 
> 
> From: Rich Megginson [mailto:rmeggins@xxxxxxxxxx]
> Sent: Friday, October 28, 2011 10:54 PM
> To: General discussion list for the 389 Directory server project.
> Cc: Das, Jyoti Ranjan (STSD)
> Subject: Re: [389-users] Data inconsitency during replication
> 
> 
> 
> On 10/20/2011 12:45 AM, Das, Jyoti Ranjan (STSD) wrote:
> 
> Hi,
> 
> 
> 
> I am new to 389 directory server. Could you please help me in the
> below mentioned query?
> 
> Thank you very much in advance.
> 
> 
> 
> Problem statement:
> 
> 
> 
> Data loss during the replication between Supplier and consumer when
> master changelog db file is being deleted due to some reason ,
> consumer is imported with some stale data and consumer doesn’t want
> initialization during the new replication agreement. The test
> scenario is given below.
> 
> 
> 
> Test scenario:
> 
> Steps:
> 
> Topology
> 
> Supplier -----------Replication agreement-----------------> Hub
> 
> Both replicas are in sync at this time as mentioned below.
> 
> Let’s take this sample example: Five entries has been added starting
> from CSN1 to CSN5
> 
> Take a db2ldif with “-r” option from the Hub replica.
> 
> Add another 5 entries in the supplier. Let’s take their CSN numbers
> are starting from CSN6 to CSN10
> 
> Delete the replication agreements
> 
> Before or after CSN6 to CSN10 have been replicated to the Hub?
> 
> Delete the master changelog db file from the changelogdb directory.
> 
> Supplier or Hub?
> 
> Add another 5 entries in the supplier. Let’s take their CSN numbers
> are staring from CSN11 to CSN15
> 
> Import the ldif file taken in Step-2 in the Hub replica( it’s a
> initialization of consumer with the stale data)
> 
> Create the replication agreement between master and hub with the “do
> not initialize” option.
> 
> Now we will see the data loss starting from CSN6 to CSN14. Only entry
> with CSN15 will be replicated to the consumer and also will continue
> further with successful replication
> 
> 
> 
> 
> 
> Questions:
> 
> Is this a correct approach in this scenario to continue with
> replication even if there are data losses instead of halting the
> replication?
> 
> From the code analysis:
> 
> File: “ ldapserver/ldap/servers/plugins/replication/cl5_api.c”
> 
> If the requested CSN number is now found in the changelog db file and
> also not there in the purge list, it makes the following assumption
> and continues with replication
> 
> 
> 
> /* there is a special case which can occur just after migration - in
> this case,
> 
> the consumer RUV will contain the last state of the supplier before
> migration,
> 
> but the supplier will have an empty changelog, or the supplier
> changelog will
> 
> not contain any entries within the consumer min and max CSN - also,
> since
> 
> the purge RUV contains no CSNs, the changelog has never been purged
> 
> ASSUMPTIONS - it is assumed that the supplier had no pending changes
> to send
> 
> to any consumers; that is, we can assume that no changes were lost
> due to
> 
> either changelog purging or database reload - bug# 603061 -
> richm@xxxxxxxxxxxx */
> 
> 
> 
> Is it a correct approach in this scenario to halt the replication
> with a fatal error message in the error log file?
> 
> Probably, but then this code would have to be a lot smarter to figure
> out that the problem is due to stale data being imported into the
> consumer. Please file a bug with exact steps to reproduce this
> problem.
> 
> 
> 
> 
> 
> Regards,
> 
> Jyoti
> 
> 
> 
> -- 389 users mailing list 389-users@xxxxxxxxxxxxxxxxxxxxxxx
> https://admin.fedoraproject.org/mailman/listinfo/389-users
> 
> 
--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users