Re: Replication Delay

"Fong, Trevor" <trevor.fong@xxxxxx> · Fri, 18 May 2018 18:39:54 +0000

Hi Everyone,

Hazzah!  I've finally licked the slow (and erratic) replication between our 1.2 -> 1.3 clusters! 
The problem was that when I was setting up the 1.3 cluster, I'd done it with a view to replace the 1.2 cluster.
In making that assumption, I'd set up the cluster in isolation.  Everything worked as it was supposed to, but I didn't occur to me to set the masters up with different replica ID's from those in the 1.2 cluster.  When I hooked the 1.3 cluster up to the 1.2 cluster, replication into the 1.3 was slow and sometimes it would just break.

Rebuilding the 1.3 cluster with unique replica ID's for all master nodes across both clusters resolved the problem.

Thanks everyone for their helpful comments.
Trev 

On 2018-02-20, 4:13 PM, "Mark Reynolds" <mreynolds@xxxxxxxxxx> wrote:

    On 02/20/2018 06:53 PM, William Brown wrote:
    > On Tue, 2018-02-20 at 23:36 +0000, Fong, Trevor wrote:
    >> Hi William,
    >>
    >> Thanks a lot for your reply.
    >>
    >> That's correct - replication schedule is not enabled.
    >> No - there are definitely changes to replicate - I know, I made the
    >> change myself (
    >> I changed the "description" attribute on an account, but it takes up
    >> to 15 mins for the change to appear in the 1.3 master.
    >> That master replicates to another master and a bunch of other hubs.
    >> Those hubs replicate amongst themselves and a bunch of consumers.
    > So to be correct in my understanding:
    >
    > 1.2 <-> 1.3 --> [ group of hubs/consumers ]
    >
    > Yes? 
    >
    >> The update can take up to 15 mins to make it from the 1.2 master,
    >> into the 1.3 master; but once it hits the 1.3 master, it is
    >> replicated around the 1.3 cluster within 1 sec.
    >>
    >> Only memberOf is disallowed for fractional replication.
    >>
    >> Can anyone give me any guidance as to the settings of the "backoff"
    >> and other parameters?  Any doc links that may be useful?
    > Mark? You wrote thisn, I can't remember what it's called ....
    Before we should adjust the back off min and max values, we need to
    determine why 1.2.11 is having a hard time updating 1.3.6.  1.3.6 is
    just receiving updates, so it's 1.2.11 that "seems" to be misbehaving. 
    So... Is there anything in the errors log on 1.2.11?  It wouldn't hurt
    to check 1.3.6, but I think 1.2.11 is where we will find our answer.

    If there is noting in the log, then turn on replication logging and do
    your test update.  Once the update hits 1.3.6 turn replication logging
    off.  Then we can look at the logs and see what happens with your test
    update.

    But as requested here is the backoff min & max info:

    http://www.port389.org/docs/389ds/design/replication-retry-settings.html

    >
    >> Thanks a lot,
    >> Trev
    >>
    >>
    >> On 2018-02-18, 3:32 PM, "William Brown" <william@xxxxxxxxxxxxxxxx>
    >> wrote:
    >>
    >>     On Sat, 2018-02-17 at 01:49 +0000, Fong, Trevor wrote:
    >>     > Hi Everyone,
    >>     >  
    >>     > I’ve set up a new 389 DS cluster (389-Directory/1.3.6.1
    >>     > B2018.016.1710) and have set up a replication agreement from
    >> our old
    >>     > cluster (389-Directory/1.2.11.15 B2014.300.2010) to a master
    >> node in
    >>     > the new cluster.  Problem is that updates in the old cluster
    >> take up
    >>     > to 15 mins to make it into the new cluster.  We need it to be
    >> near
    >>     > instantaneous, like it normally is.  Any ideas what I can
    >> check?
    >>     
    >>     I am assuming you don't have a replication schedule enabled?
    >>     
    >>     In LDAP replication is always "eventual". So a delay isn't
    >> harmful.
    >>     
    >>     But there are many things that can influence this. Ludwig is the
    >>     expert, and I expect he'll comment here. 
    >>     
    >>     Only one master may be "replicating" to a server at a time. So if
    >> your
    >>     1.3 server is replicating with other servers, then your 1.2
    >> server may
    >>     have to "wait it's turn".
    >>     
    >>     There is a replication 'backoff' timer, that sets how long it
    >> tries and
    >>     scales these attempts too. I'm not sure if 1.2 has this or not
    >> though.
    >>     
    >>     Another reason could be there are no changes to be replicated,
    >>     replication only runs when there is something to do. So your 1.2
    >> server
    >>     may have no changes, or it could be eliminating the changes with
    >>     fractional replication.
    >>     
    >>     Finally, it's very noisy but you could consider enabling
    >> replication
    >>     logging to check what's happening. 
    >>     
    >>     I hope that helps,
    >>     
    >>     
    >>     
    >>     >  
    >>     > Thanks a lot,
    >>     > Trev  
    >>     >  
    >>     > _________________________________________________
    >>     > Trevor Fong
    >>     > Senior Programmer Analyst
    >>     > Information Technology | Engage. Envision. Enable.
    >>     > The University of British Columbia
    >>     > trevor.fong@xxxxxx | 1-604-827-5247 | it.ubc.ca
    >>     >  
    >>     > _______________________________________________
    >>     > 389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
    >>     > To unsubscribe send an email to 389-users-leave@lists.fedorapro
    >> ject.o
    >>     > rg
    >>     -- 
    >>     Thanks,
    >>     
    >>     William Brown
    >>     _______________________________________________
    >>     389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
    >>     To unsubscribe send an email to 389-users-leave@lists.fedoraproje
    >> ct.org
    >>     
    >>
    >> _______________________________________________
    >> 389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
    >> To unsubscribe send an email to 389-users-leave@lists.fedoraproject.o
    >> rg
    _______________________________________________
    389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
    To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx

_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx/message/OMFYHKGVL4TFFVHDFLA6VYS6DRJNQDWL/