Hi,
Running 389-ds 1.1.2 on Centos 5.
We have suddenly seen repl5_inc_waitfor_async_results errors crop up in our error log during peak traffic hours.
The Master loses sight of Hubs and replication stalls. Most of the times it comes right back up after a couple of mins. However we have been restarting Hubs to make the process quicker.
tail -f errors
[08/Jan/2015:02:42:08 -0800] NSMMReplicationPlugin - agmt="cn=add -> hub1" (hub1:2390): Simple bind resumed
[08/Jan/2015:09:04:38 -0800] - repl5_inc_waitfor_async_results timed out waiting for responses: 0 34222
[08/Jan/2015:09:05:18 -0800] - repl5_inc_waitfor_async_results timed out waiting for responses: 0 33499
[08/Jan/2015:09:05:37 -0800] NSMMReplicationPlugin - agmt="cn=add -> hub2" (hub1:2390): Warning: unable to receive endReplication extended operation response (Can't contact LDAP server)
[08/Jan/2015:09:05:37 -0800] NSMMReplicationPlugin - agmt="cn=add -> hub2" (hub1:2390): Simple bind failed, LDAP sdk error 91 (Can't connect to the LDAP server), Netscape Portable Runtime error -5961 (TCP connection reset by peer.)
[08/Jan/2015:09:05:59 -0800] NSMMReplicationPlugin - agmt="cn=add -> hub2" (hub1:2390): Simple bind resumed
[08/Jan/2015:09:07:43 -0800] NSMMReplicationPlugin - agmt="cn=add -> hub1" (hub1:2390): Warning: unable to receive endReplication extended operation response (Can't contact LDAP server)
[08/Jan/2015:09:07:43 -0800] NSMMReplicationPlugin - agmt="cn=add -> hub1" (hub1:2390): Simple bind failed, LDAP sdk error 91 (Can't connect to the LDAP server), Netscape Portable Runtime error -5961 (TCP connection reset by peer.)
[08/Jan/2015:09:08:05 -0800] NSMMReplicationPlugin - agmt="cn=add -> hub1" (hub1:2390): Simple bind resumed
Any idea what is causing this? I checked ADD/DEL operations during the outage and none of them stand out. No MOD errors and all MODs completed within a second.
~Shardul
-- 389 users mailing list 389-users@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/389-users