Re: [EXTERNAL] Re: Replication delay, connection blocking ending in closed - B1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Colin, 

The important point in what you describe is that master2 keeps a replication session open
(i.e: no ext 2.16.840.1.113730.3.5.5) without sending any updates 

So the problem is either on master2 or on network 

my guess is that master2 has trouble to determine the next change to send.
  A possible scenario is that there is a huge changelog and it is walked from the start 
   (as described in github issue 4644 ...)

Regards,
    Pierre

On Thu, Mar 4, 2021 at 8:36 PM Colin Tulloch <Colin.Tulloch@xxxxxxxxxxx> wrote:
I tried it just now but it was way too verbose - we filled up 500mb of error logs in 15 minutes.

We have a lot more space, but it could take hours before we see another failure (these delays intermittently cause an application to fail).  Unfortunately we can't re-produce on demand.


-----Original Message-----
From: William Brown [mailto:wbrown@xxxxxxx]
Sent: Wednesday, March 03, 2021 8:38 PM
To: 389-users@xxxxxxxxxxxxxxxxxxxxxxx
Subject: [EXTERNAL] [389-users] Re: Replication delay, connection blocking ending in closed - B1

WARNING: This email originated outside of Entrust.
DO NOT CLICK links or attachments unless you trust the sender and know the content is safe.

______________________________________________________________________
Can you turn on replication logging? I think it's level 8192 in the errorlog-level.

> On 4 Mar 2021, at 11:12, Colin Tulloch <Colin.Tulloch@xxxxxxxxxxx> wrote:
>
> Hello –

> We are seeing an issue where changes can be very slow to replicate to one of our consumers (up to 15m+).  We have a large topology, but in this case the issue is isolated between 2 masters that replicate to 1 consumer.

> In one example entry addition I found, it appears that we see;

> -          connection from master1 to consumer1, bunch of changes pushed
> -          then master2 connects to consumer1
> -          master1 stops doing changes but stays connected
> -          master2 does literally 1 change (success, no issues), stays connected
> -          16 minutes goes by, no additional changes or replication EXT ops for that connection are done (master1->consumer1 EXT ops continue normally…)
> -          after that long pause, master2 disconnects - B1 bad BER tag code
> -          and then Master1 resumes making tons of changes

> Searches in this DB and others continue to take place, and changes in other DBs.  So it wasn’t as if the server was unresponsive/hung.  It is almost as if that DB went “read-only” for a time – I’m unable to tell if something else besides replication was attempting but unable to make writes though.


> Anyone see something like this before?  We see lots of B1 codes randomly, I’ve never understood what may cause that – the description of corruption/physical network problems does not make much sense.  Maybe if our directories were replicating to eachother over the internet, or using Wifi….

> The time it takes for that connection to end in the B1 doesn’t seem to
> line up with any dirsrv OR system/TCP timeouts either


> Nothing illuminating in the error logs really.

> Log snips of this happening;

> [03/Mar/2021:13:50:34.395939343 -0600] conn=270076 fd=280 slot=280
> connection from master1 to consumer1 ...
> [03/Mar/2021:13:50:34.608149165 -0600] conn=270076 op=13 MOD dn="cn=CRLblahblah,c=US"
> [03/Mar/2021:13:50:34.609565373 -0600] conn=270076 op=13 RESULT err=0
> tag=103 nentries=0 etime=0.001451229 csn=603febdd00035aae0000
> [03/Mar/2021:13:50:34.818645762 -0600] conn=270076 op=14 EXT oid="2.16.840.1.113730.3.5.5" name="replication-multimaster-extop"
> [03/Mar/2021:13:50:34.820544342 -0600] conn=270076 op=14 RESULT err=0
> tag=120 nentries=0 etime=0.002004143

> [03/Mar/2021:13:50:34.460210520 -0600] conn=270077 fd=307 slot=307
> connection from master2 to consumer1
> [03/Mar/2021:13:50:34.460676562 -0600] conn=270077 op=0 BIND
> dn="cn=replication manager,cn=config" method=128 version=3
> [03/Mar/2021:13:50:34.460992487 -0600] conn=270077 op=0 RESULT err=0 tag=97 nentries=0 etime=0.000376400 dn="cn=replication manager,cn=config"
> <snipped replication startup jargon>
> [03/Mar/2021:13:50:35.715331361 -0600] conn=270077 op=5 MOD dn="cn=CRLblahblah,c=US"
> [03/Mar/2021:13:50:35.717330540 -0600] conn=270077 op=5 RESULT err=0
> tag=103 nentries=0 etime=0.002066162 csn=603febdd00045aae0000 ...
> [03/Mar/2021:13:50:36.828647054 -0600] conn=270076 op=15 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
> [03/Mar/2021:13:50:36.828888493 -0600] conn=270076 op=15 RESULT err=0
> tag=120 nentries=0 etime=0.000368293
> [03/Mar/2021:13:50:37.112334122 -0600] conn=270076 op=16 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
> [03/Mar/2021:13:50:37.112782309 -0600] conn=270076 op=16 RESULT err=0
> tag=120 nentries=0 etime=0.000624719
> [03/Mar/2021:13:50:38.113792312 -0600] conn=270076 op=17 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
> [03/Mar/2021:13:50:38.114092751 -0600] conn=270076 op=17 RESULT err=0
> tag=120 nentries=0 etime=0.000476209 … continued EXT ops on
> conn=270076, from master1 …
> [03/Mar/2021:14:06:07.872623403 -0600] conn=270077 op=-1 fd=307 closed
> - B1
> [03/Mar/2021:14:06:07.916640931 -0600] conn=270076 op=1106 EXT oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
> [03/Mar/2021:14:06:07.917191372 -0600] conn=270076 op=1106 RESULT
> err=0 tag=120 nentries=0 etime=0.000662946
> [03/Mar/2021:14:06:07.921837193 -0600] conn=270076 op=1107 MOD dn="cn=CRLblahblah,c=US"
> [03/Mar/2021:14:06:07.923754219 -0600] conn=270076 op=1107 RESULT
> err=0 tag=103 nentries=0 etime=0.002036469 csn=603febdd00065aae0000
> and a flood of changes now ...


> Colin Tulloch
> Architect, USmPKI
> colin.tulloch@xxxxxxxxxxx

> _______________________________________________
> 389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx To
> unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
> Fedora Code of Conduct:
> https://urldefense.com/v3/__https://docs.fedoraproject.org/en-US/proje
> ct/code-of-conduct/__;!!FJ-Y8qCqXTj2!Ps6Dn3c8qrA0DRCX7rW2IFhPkZirblU2W
> u8kUbk3We1GkGmYpPtcGYtadjVcbLfQh5Y$
> List Guidelines:
> https://urldefense.com/v3/__https://fedoraproject.org/wiki/Mailing_lis
> t_guidelines__;!!FJ-Y8qCqXTj2!Ps6Dn3c8qrA0DRCX7rW2IFhPkZirblU2Wu8kUbk3
> We1GkGmYpPtcGYtadjVcTvoSprg$ List Archives:
> https://urldefense.com/v3/__https://lists.fedoraproject.org/archives/l
> ist/389-users@lists.fedoraproject.org__;!!FJ-Y8qCqXTj2!Ps6Dn3c8qrA0DRC
> X7rW2IFhPkZirblU2Wu8kUbk3We1GkGmYpPtcGYtadjVcbxMY_7c$
> Do not reply to spam on the list, report it:
> https://urldefense.com/v3/__https://pagure.io/fedora-infrastructure__;
> !!FJ-Y8qCqXTj2!Ps6Dn3c8qrA0DRCX7rW2IFhPkZirblU2Wu8kUbk3We1GkGmYpPtcGYt
> adjVcGTRcLlY$


Sincerely,

William Brown

Senior Software Engineer, 389 Directory Server SUSE Labs, Australia _______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://urldefense.com/v3/__https://docs.fedoraproject.org/en-US/project/code-of-conduct/__;!!FJ-Y8qCqXTj2!Ps6Dn3c8qrA0DRCX7rW2IFhPkZirblU2Wu8kUbk3We1GkGmYpPtcGYtadjVcbLfQh5Y$
List Guidelines: https://urldefense.com/v3/__https://fedoraproject.org/wiki/Mailing_list_guidelines__;!!FJ-Y8qCqXTj2!Ps6Dn3c8qrA0DRCX7rW2IFhPkZirblU2Wu8kUbk3We1GkGmYpPtcGYtadjVcTvoSprg$
List Archives: https://urldefense.com/v3/__https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org__;!!FJ-Y8qCqXTj2!Ps6Dn3c8qrA0DRCX7rW2IFhPkZirblU2Wu8kUbk3We1GkGmYpPtcGYtadjVcbxMY_7c$
Do not reply to spam on the list, report it: https://urldefense.com/v3/__https://pagure.io/fedora-infrastructure__;!!FJ-Y8qCqXTj2!Ps6Dn3c8qrA0DRCX7rW2IFhPkZirblU2Wu8kUbk3We1GkGmYpPtcGYtadjVcGTRcLlY$
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure


--
--

389 Directory Server Development Team
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure

[Index of Archives]     [Fedora User Discussion]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [Fedora News]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Maintainers]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Legacy]     [Fedora Desktop]     [Fedora Fonts]     [ATA RAID]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Centos]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora QA]     [Fedora Triage]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Tux]     [Yosemite News]     [Yosemite Photos]     [Linux Apps]     [Maemo Users]     [Gnome Users]     [KDE Users]     [Fedora Tools]     [Fedora Art]     [Fedora Docs]     [Maemo Users]     [Asterisk PBX]     [Fedora Sparc]     [Fedora Universal Network Connector]     [Fedora ARM]

  Powered by Linux