Re: [389-devel] fractional replication monitoring proposal

Mark Reynolds <mareynol@xxxxxxxxxx> · Thu, 17 Oct 2013 12:01:38 -0400



    Thanks everyone for your feedback!

    
    Ok I have written an initial fix, and here is how it works and what
    I am seeing...

    
    [1]  An update comes it and we update the local RUV.

    [2]  We check this update against the fractional/stripped attrs in
    each agmt.

    [3]  If this update does replicate to at least one agmt, we write a
    new attribute to the local ruv (currently call "nsds50replruv" - we
    can improve the names later).  If it doesn't replicate to any
    replicas then we don't update the new ruv attribute.  This all
    happens at the same time in write_changelog_and_ruv().  So there is
    no delay or copying of useless ruv info, and we write to the local
    RUV instead of a new RUV in cn=config(which I had originally
    proposed).

    
    [4]  Here we made an update that is stripped by fractional
    replication:

    
    Master A:

    
     ldapsearch -h localhost -D cn=dm -w password -b "dc=example,dc=com"
    -xLLL 
    '(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))'
    nsds50ruv nsds50replruv

    dn: cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config

    nsds50ruv: {replica 1 ldap://localhost.localdomain:389}
    52583d80000000010000 52600339000000010000

    nsds50replruv: {replica 1 ldap://localhost.localdomain:389}
    52583d80000000010000 5260030d000000010000

    ...

    
    Master B

    
     ldapsearch -h localhost -D cn=dm -w password -b "dc=example,dc=com"
    -xLLL -p 22222 
    '(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))'
    nsds50ruv nsds50replruv

    dn: cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config

    nsds50ruv: {replica 1 ldap://localhost.localdomain:389}
    52583d80000000010000 5260030d000000010000

    nsds50replruv: {replica 1 ldap://localhost.localdomain:389}
    52583d80000000010000 5260030d000000010000

    ...

    
    [5]  If we look at the "fractional" ruv (nsds50replruv) on Master A,
    it does correctly line up with the ruv on master B(nsds50ruv).

    [6]  Then we make an update that does replicate, and now all the
    ruv's line up.

    
    Master A

    
    ldapsearch -h localhost -D cn=dm -w password -b "dc=example,dc=com"
    -xLLL 
    '(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))'
    nsds50ruv nsds50replruv

    dn: cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config

    nsds50ruv: {replica 1 ldap://localhost.localdomain:389}
    52583d80000000010000 52600790000000010000

    nsds50replruv: {replica 1 ldap://localhost.localdomain:389}
    52583d80000000010000 52600790000000010000

    
    Master B

    
    ldapsearch -h localhost -D cn=dm -w password -b "dc=example,dc=com"
    -xLLL -p 22222 
    '(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))'
    nsds50ruv nsds50replruv

    dn: cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config

    nsds50ruv: {replica 1 ldap://localhost.localdomain:389}
    52583d80000000010000 52600790000000010000

    nsds50replruv: {replica 1 ldap://localhost.localdomain:389}
    52583d80000000010000 52600790000000010000

    
    There are still the same problems with fix, as I mentioned before,
    except we're not updating the dse config.  Now, I am concerned about
    the performance hit of checking to see if a mod gets "replicated". 
    

    As for the "sync" question, this fix does change how that behaves,
    or how repl-monitor already works.  It's either behind(by a certain
    amount of time), or in sync.  I'm not trying to improve the current
    repl status model.

    
    Anyway, I just wanted to see if I could get this working.  Comments
    welcome.

    
    Thanks again,

    Mark

    
    On 10/17/2013 05:44 AM, thierry bordaz
      wrote:

    
      On 10/17/2013 11:06 AM, Ludwig
        Krispenz wrote:

      
        On 10/17/2013 10:56 AM, thierry
          bordaz wrote:

        
          On 10/17/2013 10:49 AM, Ludwig
            Krispenz wrote:

          
            On 10/17/2013 10:15 AM, thierry
              bordaz wrote:

            
              On 10/16/2013 05:41 PM,
                Ludwig Krispenz wrote:

              
                On 10/16/2013 05:28 PM,
                  Mark Reynolds wrote:

                
                  On 10/16/2013 11:05 AM,
                    Ludwig Krispenz wrote:

                  
                    On 10/15/2013 10:41 PM,
                      Mark Reynolds wrote:

                    
                      https://fedorahosted.org/389/ticket/47368
                        

                        So we run into issues when trying to figure out
                        if replicas are in synch(if those replicas use
                        fractional replication and "strip mods").  What
                        happens is that an update is made on master A,
                        but due to fractional replication there is no
                        update made to any replicas. So if you look at
                        the ruv in the tombstone entry on each server,
                        it would appear they are out of synch.  So using
                        the ruv in the db tombstone is no longer
                        accurate when using fractional replication. 

                        
                        I'm proposing a new ruv to be stored in the
                        backend replica entry: e.g.
                        cn=replica,cn="dc=example,dc=com",cn=mapping
                        tree,cn=config. I'm calling this the "replicated
                        ruv".  So whenever we actually send an update to
                        a replica, this ruv will get updated.  
                    
                    I don't see how this will help, you have an
                    additional info on waht has been replicated (which
                    is available on the consumer as well) and you have a
                    max csn, but you don't know if there are outstanding
                    fractional changes to be sent.

                  
                  Well you will know on master A what operations get
                  replicated(this updates the new ruv before sending any
                  changes), and you can use this ruv to compare against
                  the other master B's ruv(in its replication
                  agreement).   Maybe I am missing your point?  
                MY point is that the question is, what is NOT yet
                replicated. Without fractional replication you have
                states of the ruv on all servers, and if ruv(A) >
                ruv(B) you know there are updates missing on B. With
                fractional, if (ruv(A) > ruv(B) this might be ok or
                not. If you keep an additional ruv on A when sending
                updates to be, you can only record what ws sent or
                attempted to send, but not what still has to be sent

              
              I agree with you Ludwig, but unless I missed something
              would not be enough to know that the replica B is late or
              in sync ?

              
              For example, we have updates U1 U2 U3 and U4. U3 should be
              skipped by fractional replication.

              
              replica RUV (tombstone) on master_A contains U4 and
              master_B replica RUV contains U1.

              Let's assume that as initial value of the "replicated ruv"
              on master_A we have U1.

              Starting a replication session, master_A should send U2
              and update the "replicated ruv" to U2.

              If the update is successfully applied on master_B,
              master_B replica ruv is U2 and monitoring the two ruv
              shoud show they are in sync.

            
            They are not, since U4 is not yet replicated, in master_A
            you see the "normal" ruv as U4 and the "replicated" ruv as
            U2, but you don't know how many changes are between U2 and
            U4 an if any of them should be replicated, the replicated
            ruv is more or less a local copy of the remote ruv

          
          Yes I agree they are not this is a transient status. Transient
          because the RA will continue going through the changelog until
          it hits U4. At this point it will write U4 in the "replicated
          RUV" and until master_B will apply U4 both server will appear
          out of sync.

          My understanding is that this "replicated RUV" only says it is
          in sync or not, but does not address how far a server is out
          of sync from the other (how many updates are missing). When
          you say it is more or less a copy, it is exactly what it is.
          If it is a copy => in sync, if it different => out of
          sync.

        
        maybe we need to define what "in sync" means. For me in sync
        means both servers have the same set of updates applied.

        
        Forget fractional for a moment, if we have standard replication
        and master A is at U4 and master B is at U2, we say they are not
        in sync - or not ? You could keep a replicated ruv for thos as
        well, but this wouldn't change things.

      
      I agree we need to agree of what "in sync" means  :-) 

      
      I would prefer to speak of 'fractional ruv' (in place of
      'replicated ruv') for the new ruv proposed by Mark.

       'replica ruv' being for the traditional ruv (tombstone) used in
      standard replication.

      
      With  'replica ruv' we are in sync when the 'replica ruv' on both
      side have the same value.

      With 'fractional ruv' we are in sync when the 'fractional ruv' on
      the supplier and the 'replica ruv' have the same value.

      
      In fractional replication, we have updates U1, U2, U3 and U4.
      Let's U3 and U4 being skipped by fractional

      Let master_A 'replica ruv' is U4 and master_B 'replica ruv' is U2.
      And no new updates.

      From a standard replication point of view they are out of sync,
      but for fractional they are in sync.

      
      For fractional, how to know that that both masters are in sync.
      With Mark solution 'fractional ruv' shows U2.

      
      Now a new update arrives U5 that is not skipped by fractional. 

      master_A 'replicat ruv' is U5 and master_B 'replica ruv' is U2.

      until the replica agreement starts a new replication session,
      'fractional ruv' shows U2. 

      The servers are shown 'in sync', because the RA has not yet
      started. 

      From my understanding, the solution proposed by Mark has a
      drawback where for a transient period (time to the RA to start its
      jobs, evaluate and send U5, store it into the 'fractional ruv'),
      the servers will appear 'in sync' although they are not. It could
      be an issue with schedule replication but should be transient
      wrong status under normal condition.

      
             If the update is not applierd, master_B
              replica ruv stays at U1 and the two ruv will show out of
              sync.

              
              In the first case, we have a transient status of 'in sync'
              because the replica agreement will evaluate U3 then U4
              then send U4 and store it into the "replicated ruv". At
              this point master_A and master_B will appear out of sync
              until master_B will apply U4.

              If U4 was to be skipped by fractional we have master_B ruv
              and Master_A replicated ruv both showing U2 and that is
              correct both servers are in sync.

              
              Mark instead of storing the replicated ruv in the replica,
              would not be possible to store it into the replica
              agreement (one replicated ruv per RA). So that it can
              solve the problem of different fractional replication
              policy ?

              
                Do you mean changes that have not been
                  read from the changelog yet?  My plan was to update
                  the new ruv in perform_operation() - right after all
                  the "stripping" has been done and there is something
                  to replicate.  We need to have a ruv for replicated
                  operations.

                  
                  I guess there are other scenarios I didn't think of,
                  like if replication is in a backoff state, and valid
                  changes are coming in.  Maybe, we could do test
                  "stripping" earlier in the replication process(when
                  writing to the changelog?), and then update the new
                  ruv there instead of waiting until we try and send the
                  changes.

                  
                      Since


                        we can not compare this "replicated ruv" to the
                        replicas tombstone ruv, we can instead compare
                        the "replicated ruv" to the ruv in the replica's
                        repl agreement(unless it is a dedicated consumer
                        - here we might be able to still look at the db
                        tombstone ruv to determine the status). 

                        
                        Problems with this approach: 

                        
                        -  All the servers need to have the same
                        replication configuration(the same fractional
                        replication policy and attribute stripping) to
                        give accurate results. 

                        
                        -  If one replica has an agreement that does NOT
                        filter the updates, but has agreements that do
                        filter updates, then we can not correctly
                        determine its synchronization state with the
                        fractional replicas. 

                        
                        -  Performance hit from updating another ruv(in
                        cn=config)? 

                        
                        Fractional replication simply breaks our
                        monitoring process.  I'm not sure, not without
                        updating the repl protocol, that we can cover
                        all deployment scenarios(mixed fractional repl
                        agmts, etc). However, I "think" this approach
                        would work for most deployments(compared to none
                        at the moment).  For IPA, since they don't use
                        consumers, this approach would work for them. 
                        And finally, all of this would have to be
                        handled by a updated version of repl-monitor.pl.
                        

                        This is just my preliminary idea on how to
                        handle this.  Feedback is welcome!! 

                        
                        Thanks in advance, 

                        Mark 

                         
                      -- 
Mark Reynolds
389 Development Team
Red Hat, Inc
mreynolds@xxxxxxxxxx
                      

                      --
389-devel mailing list
389-devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-devel
                    
                    
                    --
389-devel mailing list
389-devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-devel
                  
                  
                  -- 
Mark Reynolds
389 Development Team
Red Hat, Inc
mreynolds@xxxxxxxxxx
                
                
                --
389-devel mailing list
389-devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-devel
              
              
    -- 
Mark Reynolds
389 Development Team
Red Hat, Inc
mreynolds@xxxxxxxxxx
  

--
389-devel mailing list
389-devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-devel