Chaining woes again...

jacek.nykis at betfair.com (Jacek Nykis) · Mon, 4 Oct 2010 14:43:44 +0100

On Thursday 30 September 2010 00:00:27 Jacek Nykis wrote:
> On Wednesday 29 September 2010 23:56:53 Rich Megginson wrote:
> > Jacek Nykis wrote:
> > > On Wednesday 29 September 2010 23:30:49 Rich Megginson wrote:
> > >> Jacek Nykis wrote:
> > >>> On Wednesday 29 September 2010 14:04:38 Gerrard Geldenhuis wrote:
> > >>>> Hi
> > >>>> I have setup chaining but it is not working at all and I am not sure
> > >>>> how to debug it further.
> > >>>> 
> > >>>> I am using:
> > >>>> 389-admin-1.1.11-0.6.rc2.el5
> > >>>> 389-admin-console-1.1.5-1.el5
> > >>>> 389-admin-console-doc-1.1.5-1.el5
> > >>>> 389-adminutil-1.1.8-4.el5
> > >>>> 389-console-1.1.4-1.el5
> > >>>> 389-ds-1.2.1-1.el5
> > >>>> 389-ds-base-1.2.6-0.11.rc7.el5
> > >>>> 389-ds-console-1.2.3-1.el5
> > >>>> 389-ds-console-doc-1.2.3-1.el5
> > >>>> 389-dsgw-1.1.5-1.el5
> > >>>> 
> > >>>> The setup is 4 servers, two multimasters and two consumers. Client
> > >>>> can only speak to the consumers and thus referrals won't work.
> > >>>> 
> > >>>> 
> > >>>> I have used the following ldif to setup chaining:
> > >>>> 
> > >>>> dn: cn=chainingBackend,cn=chaining database,cn=plugins,cn=config
> > >>>> changetype: add
> > >>>> objectClass: top
> > >>>> objectClass: extensibleObject
> > >>>> objectClass: nsBackendInstance
> > >>>> cn: chainingBackend
> > >>>> nsslapd-suffix: dc=mycompany
> > >>>> nsmultiplexorbinddn: cn=replication manager,cn=config
> > >>>> nsusestarttls: on
> > >>>> nsfarmserverurl: ldaps://masterfqdn1:636 masterfqdn2:636/
> > >>>> nsmultiplexorcredentials: {SSHA}blah
> > >>>> nsbindconnectionslimit: 5
> > >>>> nsconcurrentoperationslimit: 5
> > >>>> nsconnectionlife: 130
> > >>>> nsbindtimeout: 3
> > >>>> nsbindretrylimit: 3
> > >>>> nsmaxresponsedelay: 3
> > >>>> nsmaxtestresponsedelay: 5
> > >>>> 
> > >>>> dn: cn=dc\3dmycompany,cn=mapping tree,cn=config
> > >>>> changetype: modify
> > >>>> add: nsslapd-backend
> > >>>> nsslapd-backend: chainingBackend
> > >>>> -
> > >>>> replace: nsslapd-state
> > >>>> nsslapd-state: backend
> > >>>> -
> > >>>> replace: nsslapd-distribution-plugin
> > >>>> nsslapd-distribution-plugin:
> > >>>> /usr/lib64/dirsrv/plugins/libreplication-plugin.so -
> > >>>> replace: nsslapd-distribution-funct
> > >>>> nsslapd-distribution-funct: repl_chain_on_update
> > >>>> 
> > >>>> 
> > >>>> dn: cn=config,cn=chaining database,cn=plugins,cn=config
> > >>>> changetype: modify
> > >>>> add: nsTransmittedControls
> > >>>> nsTransmittedControls: 2.16.840.1.113730.3.4.12
> > >>>> 
> > >>>> The ACI has been created to allow the Replication Manager user proxy
> > >>>> access.
> > >>>> 
> > >>>> When I run the following on the client:
> > >>>> 
> > >>>> dn: uid=john,ou=people,dc=mycompany
> > >>>> changetype: modify
> > >>>> add: mobile
> > >>>> mobile: 1234
> > >>>> 
> > >>>> The entry gets added but only locally, it thus seems to be
> > >>>> completely ignoring the chaining I have setup. I see the following
> > >>>> in the consumer log after creation:
> > >>>> 
> > >>>> [29/Sep/2010:13:00:11 +0000] start_tls - Received extended operation
> > >>>> request with OID 1.3.6.1.4.1.1466.20037 [29/Sep/2010:13:00:11 +0000]
> > >>>> start_tls - Start TLS extended operation request confirmed.
> > >>>> [29/Sep/2010:13:00:11 +0000] start_tls - Start TLS request
> > >>>> accepted.Server willing to negotiate SSL. [29/Sep/2010:13:00:11
> > >>>> +0000] start_tls - Starting SSL Handshake.
> > >>>> [29/Sep/2010:13:00:11 +0000] NS7bitAttr - MODIFY begin
> > >>>> [29/Sep/2010:13:00:11 +0000] NSMMReplicationPlugin - Purged state
> > >>>> information from entry uid=rytis,ou=People,dc=betfair up to CSN
> > >>>> 4c99ec08000000010000 [29/Sep/2010:13:00:12 +0000] roles-plugin - -->
> > >>>> roles_post_op
> > >>>> [29/Sep/2010:13:00:12 +0000] roles-plugin - -->
> > >>>> roles_cache_change_notify [29/Sep/2010:13:00:12 +0000] roles-plugin
> > >>>> - <-- roles_cache_change_notify: not a role entry
> > >>>> [29/Sep/2010:13:00:12 +0000] roles-plugin - <-- roles_post_op
> > >>>> 
> > >>>> 
> > >>>> There is some other replay failure errors which I am not sure is
> > >>>> related. Having done the the test twice I did not see the replay
> > >>>> errors again in the master log. I am going to simplify my test
> > >>>> environment as I currently have 4 servers which all are verbal about
> > >>>> replication and I multimaster netscapedb which adds to the
> > >>>> complications.
> > >>>> 
> > >>>> I have enabled Replication and Plug-ins for the error log, is there
> > >>>> any other recommended logs that I should enable that can assist me
> > >>>> in debugging chaining issues.
> > >>> 
> > >>> Hi,
> > >>> I am working with Gerrard on this issue. I took some packet captures
> > >>> and it would seem that chaining in fact picks up updates but it does
> > >>> not handle them properly.
> > >>> 
> > >>> Our design is:
> > >>> Client ----> Slave ----> Master
> > >>> 
> > >>> We chain all updates on slave to master and client only has access to
> > >>> slave. We also have replication from master to slave.
> > >>> 
> > >>> When I try to make an update here is what happens between client and
> > >>> slave: bindRequest(1) "uid=xxxx,ou=People,dc=xxxx" simple
> > >>> bindResponse(1) success
> > >>> modifyRequest(2) "uid=xxx,ou=people,dc=xxx"
> > >>> modifyResponse(2) operationsError
> > >>> unbindRequest(3)
> > >>> 
> > >>> At the same time between slave and master:
> > >>> searchRequest(1) "<ROOT>" baseObject
> > >>> searchResEntry(1) "<ROOT>" | searchResDone(1) success  [1 result]
> > >>> unbindRequest(2)
> > >>> 
> > >>> This does not look correct (no modification request at all goes to
> > >>> master).
> > >> 
> > >> Right, because it is rejected on the slave due to operationsError
> > > 
> > > Thank you for your answer.
> > > I enabled verbose logging but I am unable to find out what is causing
> > > "operationsError".
> > > 
> > > Log below suggests that chainingBackend is being selected just before
> > 
> > > modification starts but I am not sure if it is actually used:
> > hard to tell from the below - what log level did you use?
> 
> To get this output I enabled:
> Heavy trace output
> Connection management
> Plug-ins
> Access control summary
> 
> > > [29/Sep/2010:22:41:54 +0000] - new connection on 66
> > > [29/Sep/2010:22:41:54 +0000] - activity on 66r
> > > [29/Sep/2010:22:41:54 +0000] - read activity on 66
> > > [29/Sep/2010:22:41:54 +0000] - conn 184 activity level = 0
> > > [29/Sep/2010:22:41:54 +0000] - listener got signaled
> > > [29/Sep/2010:22:41:54 +0000] - mapping tree selected backend : userRoot
> > > [29/Sep/2010:22:41:54 +0000] - mapping tree selected backend : userRoot
> > > [29/Sep/2010:22:41:54 +0000] - mapping tree release backend : userRoot
> > > [29/Sep/2010:22:41:54 +0000] - mapping tree selected backend : userRoot
> > > [29/Sep/2010:22:41:54 +0000] - mapping tree release backend : userRoot
> > > [29/Sep/2010:22:41:54 +0000] - mapping tree selected backend : userRoot
> > > [29/Sep/2010:22:41:54 +0000] - mapping tree release backend : userRoot
> > > [29/Sep/2010:22:41:54 +0000] - activity on 66r
> > > [29/Sep/2010:22:41:54 +0000] - read activity on 66
> > > [29/Sep/2010:22:41:54 +0000] - do_modify: dn (uid=xxx,ou=people,dc=xxx)
> > > [29/Sep/2010:22:41:54 +0000] - listener got signaled
> > > [29/Sep/2010:22:41:54 +0000] - modifications:
> > > [29/Sep/2010:22:41:54 +0000] -  replace: mobile
> > > [29/Sep/2010:22:41:54 +0000] - mapping tree selected backend :
> > > chainingBackend [29/Sep/2010:22:41:54 +0000] - mapping tree selected
> > > backend : userRoot [29/Sep/2010:22:41:54 +0000] - mapping tree release
> > > backend : userRoot [29/Sep/2010:22:41:54 +0000] NS7bitAttr - MODIFY
> > > begin
> > > [29/Sep/2010:22:41:54 +0000] - mapping tree selected backend : userRoot
> > > [29/Sep/2010:22:41:54 +0000] - mapping tree release backend : userRoot
> > > [29/Sep/2010:22:41:54 +0000] - activity on 66r
> > > [29/Sep/2010:22:41:54 +0000] - read activity on 66
> > > [29/Sep/2010:22:41:54 +0000] roles-plugin - --> roles_post_op
> > > [29/Sep/2010:22:41:54 +0000] roles-plugin - -->
> > > roles_cache_change_notify [29/Sep/2010:22:41:54 +0000] roles-plugin -
> > > <-- roles_post_op
> > > 
> > >>> Does anybody know what the problem could be or where to look for it?

I managed to resolve the problem by stopping directory server and editing 
/etc/dirsrv/slapd-xxx/dse.ldif file to have the following order of nsslaps-
backend entries:
nsslapd-backend: userRoot
nsslapd-backend: chainingBackend

After this modification the server started chaining requests properly. I am not 
sure exactly which part of my installation procedure caused the problem but I 
most of it is done using LDIF files based on audit log. If I find some more time 
I will try to get some more details about exact step which causes the issue.

Regards
Jacek

________________________________________________________________________
In order to protect our email recipients, Betfair Group use SkyScan from 
MessageLabs to scan all Incoming and Outgoing mail for viruses.

________________________________________________________________________