Re: Replication Delay

William Brown <william@xxxxxxxxxxxxxxxx> · Sat, 19 May 2018 10:12:03 +1000

On Fri, 2018-05-18 at 18:39 +0000, Fong, Trevor wrote:
> Hi Everyone,
> 
> Hazzah!  I've finally licked the slow (and erratic) replication
> between our 1.2 -> 1.3 clusters! 
> The problem was that when I was setting up the 1.3 cluster, I'd done
> it with a view to replace the 1.2 cluster.
> In making that assumption, I'd set up the cluster in
> isolation.  Everything worked as it was supposed to, but I didn't
> occur to me to set the masters up with different replica ID's from
> those in the 1.2 cluster.  When I hooked the 1.3 cluster up to the
> 1.2 cluster, replication into the 1.3 was slow and sometimes it would
> just break.
> 
> Rebuilding the 1.3 cluster with unique replica ID's for all master
> nodes across both clusters resolved the problem.

Great work to find this. I think we say you need unique rids on all
masters in the docs, but we don't enforce it at a programming level.

TBH I really want rids to be allocated by the server tools - there is
some things in the works for this, but they are not yet ready. 

Better idea is rids should just be guid's, but I don't think Ludwig or
I want to rewrite all of replication for this :) 

> 
> Thanks everyone for their helpful comments.
> Trev 
> 
> 
> On 2018-02-20, 4:13 PM, "Mark Reynolds" <mreynolds@xxxxxxxxxx>
> wrote:
> 
>     
>     
>     On 02/20/2018 06:53 PM, William Brown wrote:
>     > On Tue, 2018-02-20 at 23:36 +0000, Fong, Trevor wrote:
>     >> Hi William,
>     >>
>     >> Thanks a lot for your reply.
>     >>
>     >> That's correct - replication schedule is not enabled.
>     >> No - there are definitely changes to replicate - I know, I
> made the
>     >> change myself (
>     >> I changed the "description" attribute on an account, but it
> takes up
>     >> to 15 mins for the change to appear in the 1.3 master.
>     >> That master replicates to another master and a bunch of other
> hubs.
>     >> Those hubs replicate amongst themselves and a bunch of
> consumers.
>     > So to be correct in my understanding:
>     >
>     > 1.2 <-> 1.3 --> [ group of hubs/consumers ]
>     >
>     > Yes? 
>     >
>     >> The update can take up to 15 mins to make it from the 1.2
> master,
>     >> into the 1.3 master; but once it hits the 1.3 master, it is
>     >> replicated around the 1.3 cluster within 1 sec.
>     >>
>     >> Only memberOf is disallowed for fractional replication.
>     >>
>     >> Can anyone give me any guidance as to the settings of the
> "backoff"
>     >> and other parameters?  Any doc links that may be useful?
>     > Mark? You wrote thisn, I can't remember what it's called ....
>     Before we should adjust the back off min and max values, we need
> to
>     determine why 1.2.11 is having a hard time updating 1.3.6.  1.3.6
> is
>     just receiving updates, so it's 1.2.11 that "seems" to be
> misbehaving. 
>     So... Is there anything in the errors log on 1.2.11?  It wouldn't
> hurt
>     to check 1.3.6, but I think 1.2.11 is where we will find our
> answer.
>     
>     If there is noting in the log, then turn on replication logging
> and do
>     your test update.  Once the update hits 1.3.6 turn replication
> logging
>     off.  Then we can look at the logs and see what happens with your
> test
>     update.
>     
>     But as requested here is the backoff min & max info:
>     
>     http://www.port389.org/docs/389ds/design/replication-retry-settin
> gs.html
>     
>     >
>     >> Thanks a lot,
>     >> Trev
>     >>
>     >>
>     >> On 2018-02-18, 3:32 PM, "William Brown" <william@xxxxxxxxxxxxx
> .au>
>     >> wrote:
>     >>
>     >>     On Sat, 2018-02-17 at 01:49 +0000, Fong, Trevor wrote:
>     >>     > Hi Everyone,
>     >>     >  
>     >>     > I’ve set up a new 389 DS cluster (389-Directory/1.3.6.1
>     >>     > B2018.016.1710) and have set up a replication agreement
> from
>     >> our old
>     >>     > cluster (389-Directory/1.2.11.15 B2014.300.2010) to a
> master
>     >> node in
>     >>     > the new cluster.  Problem is that updates in the old
> cluster
>     >> take up
>     >>     > to 15 mins to make it into the new cluster.  We need it
> to be
>     >> near
>     >>     > instantaneous, like it normally is.  Any ideas what I
> can
>     >> check?
>     >>     
>     >>     I am assuming you don't have a replication schedule
> enabled?
>     >>     
>     >>     In LDAP replication is always "eventual". So a delay isn't
>     >> harmful.
>     >>     
>     >>     But there are many things that can influence this. Ludwig
> is the
>     >>     expert, and I expect he'll comment here. 
>     >>     
>     >>     Only one master may be "replicating" to a server at a
> time. So if
>     >> your
>     >>     1.3 server is replicating with other servers, then your
> 1.2
>     >> server may
>     >>     have to "wait it's turn".
>     >>     
>     >>     There is a replication 'backoff' timer, that sets how long
> it
>     >> tries and
>     >>     scales these attempts too. I'm not sure if 1.2 has this or
> not
>     >> though.
>     >>     
>     >>     Another reason could be there are no changes to be
> replicated,
>     >>     replication only runs when there is something to do. So
> your 1.2
>     >> server
>     >>     may have no changes, or it could be eliminating the
> changes with
>     >>     fractional replication.
>     >>     
>     >>     Finally, it's very noisy but you could consider enabling
>     >> replication
>     >>     logging to check what's happening. 
>     >>     
>     >>     I hope that helps,
>     >>     
>     >>     
>     >>     
>     >>     >  
>     >>     > Thanks a lot,
>     >>     > Trev  
>     >>     >  
>     >>     > _________________________________________________
>     >>     > Trevor Fong
>     >>     > Senior Programmer Analyst
>     >>     > Information Technology | Engage. Envision. Enable.
>     >>     > The University of British Columbia
>     >>     > trevor.fong@xxxxxx | 1-604-827-5247 | it.ubc.ca
>     >>     >  
>     >>     > _______________________________________________
>     >>     > 389-users mailing list -- 389-users@lists.fedoraproject.
> org
>     >>     > To unsubscribe send an email to 389-users-leave@xxxxxxxx
> dorapro
>     >> ject.o
>     >>     > rg
>     >>     -- 
>     >>     Thanks,
>     >>     
>     >>     William Brown
>     >>     _______________________________________________
>     >>     389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxx
> g
>     >>     To unsubscribe send an email to 389-users-leave@xxxxxxxxxx
> raproje
>     >> ct.org
>     >>     
>     >>
>     >> _______________________________________________
>     >> 389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
>     >> To unsubscribe send an email to 389-users-leave@lists.fedorapr
> oject.o
>     >> rg
>     _______________________________________________
>     389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
>     To unsubscribe send an email to 389-users-leave@lists.fedoraproje
> ct.org
>     
> 
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx/message/L5F3BRALCWW34YFHPEMST2EGSAHZMBVD/