On Fri, 2018-05-18 at 18:39 +0000, Fong, Trevor wrote: > Hi Everyone, > > Hazzah! I've finally licked the slow (and erratic) replication > between our 1.2 -> 1.3 clusters! > The problem was that when I was setting up the 1.3 cluster, I'd done > it with a view to replace the 1.2 cluster. > In making that assumption, I'd set up the cluster in > isolation. Everything worked as it was supposed to, but I didn't > occur to me to set the masters up with different replica ID's from > those in the 1.2 cluster. When I hooked the 1.3 cluster up to the > 1.2 cluster, replication into the 1.3 was slow and sometimes it would > just break. > > Rebuilding the 1.3 cluster with unique replica ID's for all master > nodes across both clusters resolved the problem. Great work to find this. I think we say you need unique rids on all masters in the docs, but we don't enforce it at a programming level. TBH I really want rids to be allocated by the server tools - there is some things in the works for this, but they are not yet ready. Better idea is rids should just be guid's, but I don't think Ludwig or I want to rewrite all of replication for this :) > > Thanks everyone for their helpful comments. > Trev > > > On 2018-02-20, 4:13 PM, "Mark Reynolds" <mreynolds@xxxxxxxxxx> > wrote: > > > > On 02/20/2018 06:53 PM, William Brown wrote: > > On Tue, 2018-02-20 at 23:36 +0000, Fong, Trevor wrote: > >> Hi William, > >> > >> Thanks a lot for your reply. > >> > >> That's correct - replication schedule is not enabled. > >> No - there are definitely changes to replicate - I know, I > made the > >> change myself ( > >> I changed the "description" attribute on an account, but it > takes up > >> to 15 mins for the change to appear in the 1.3 master. > >> That master replicates to another master and a bunch of other > hubs. > >> Those hubs replicate amongst themselves and a bunch of > consumers. > > So to be correct in my understanding: > > > > 1.2 <-> 1.3 --> [ group of hubs/consumers ] > > > > Yes? > > > >> The update can take up to 15 mins to make it from the 1.2 > master, > >> into the 1.3 master; but once it hits the 1.3 master, it is > >> replicated around the 1.3 cluster within 1 sec. > >> > >> Only memberOf is disallowed for fractional replication. > >> > >> Can anyone give me any guidance as to the settings of the > "backoff" > >> and other parameters? Any doc links that may be useful? > > Mark? You wrote thisn, I can't remember what it's called .... > Before we should adjust the back off min and max values, we need > to > determine why 1.2.11 is having a hard time updating 1.3.6. 1.3.6 > is > just receiving updates, so it's 1.2.11 that "seems" to be > misbehaving. > So... Is there anything in the errors log on 1.2.11? It wouldn't > hurt > to check 1.3.6, but I think 1.2.11 is where we will find our > answer. > > If there is noting in the log, then turn on replication logging > and do > your test update. Once the update hits 1.3.6 turn replication > logging > off. Then we can look at the logs and see what happens with your > test > update. > > But as requested here is the backoff min & max info: > > http://www.port389.org/docs/389ds/design/replication-retry-settin > gs.html > > > > >> Thanks a lot, > >> Trev > >> > >> > >> On 2018-02-18, 3:32 PM, "William Brown" <william@xxxxxxxxxxxxx > .au> > >> wrote: > >> > >> On Sat, 2018-02-17 at 01:49 +0000, Fong, Trevor wrote: > >> > Hi Everyone, > >> > > >> > I’ve set up a new 389 DS cluster (389-Directory/1.3.6.1 > >> > B2018.016.1710) and have set up a replication agreement > from > >> our old > >> > cluster (389-Directory/1.2.11.15 B2014.300.2010) to a > master > >> node in > >> > the new cluster. Problem is that updates in the old > cluster > >> take up > >> > to 15 mins to make it into the new cluster. We need it > to be > >> near > >> > instantaneous, like it normally is. Any ideas what I > can > >> check? > >> > >> I am assuming you don't have a replication schedule > enabled? > >> > >> In LDAP replication is always "eventual". So a delay isn't > >> harmful. > >> > >> But there are many things that can influence this. Ludwig > is the > >> expert, and I expect he'll comment here. > >> > >> Only one master may be "replicating" to a server at a > time. So if > >> your > >> 1.3 server is replicating with other servers, then your > 1.2 > >> server may > >> have to "wait it's turn". > >> > >> There is a replication 'backoff' timer, that sets how long > it > >> tries and > >> scales these attempts too. I'm not sure if 1.2 has this or > not > >> though. > >> > >> Another reason could be there are no changes to be > replicated, > >> replication only runs when there is something to do. So > your 1.2 > >> server > >> may have no changes, or it could be eliminating the > changes with > >> fractional replication. > >> > >> Finally, it's very noisy but you could consider enabling > >> replication > >> logging to check what's happening. > >> > >> I hope that helps, > >> > >> > >> > >> > > >> > Thanks a lot, > >> > Trev > >> > > >> > _________________________________________________ > >> > Trevor Fong > >> > Senior Programmer Analyst > >> > Information Technology | Engage. Envision. Enable. > >> > The University of British Columbia > >> > trevor.fong@xxxxxx | 1-604-827-5247 | it.ubc.ca > >> > > >> > _______________________________________________ > >> > 389-users mailing list -- 389-users@lists.fedoraproject. > org > >> > To unsubscribe send an email to 389-users-leave@xxxxxxxx > dorapro > >> ject.o > >> > rg > >> -- > >> Thanks, > >> > >> William Brown > >> _______________________________________________ > >> 389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxx > g > >> To unsubscribe send an email to 389-users-leave@xxxxxxxxxx > raproje > >> ct.org > >> > >> > >> _______________________________________________ > >> 389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx > >> To unsubscribe send an email to 389-users-leave@lists.fedorapr > oject.o > >> rg > _______________________________________________ > 389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx > To unsubscribe send an email to 389-users-leave@lists.fedoraproje > ct.org > > _______________________________________________ 389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx/message/L5F3BRALCWW34YFHPEMST2EGSAHZMBVD/