Re: [389-users] more MMR issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Robert Viduya wrote:

On Nov 10, 2009, at 3:25 PM, Rich Megginson wrote:

No. Check the access log to see what operations were submitted to the directory server at or around [10/Nov/2009:08:56:27 -0500]

Are your servers in time sync? Is cn=people rewbell gertrude the agreement that sends updates to the master that is having the spontaneous import problem?

Yes, "cn=people rewbell gertrude" is that agreement. We name our agreements with the name of the database ("people") followed by the name of the source server ("rewbell") followed by the name of the destination server ("gertrude"). Gertrude is the master that had the spontaneous import problem.

We have ntpd running on all our machines, so all times should be in sync.

We have a lot going on, but I found the following in the access log on the problem server:

[10/Nov/2009:08:55:19 -0500] conn=158624 op=71 RESULT err=0 tag=103 nentries=0 etime=0 csn=4af96f38000000370000 [10/Nov/2009:08:55:19 -0500] conn=158624 op=72 MOD dn="gtdirguid=54e468fd7520dbdb1da52263fb4a6576,ou=people,dc=gted,dc=gatech,dc=edu" [10/Nov/2009:08:55:19 -0500] conn=158624 op=72 RESULT err=0 tag=103 nentries=0 etime=0 csn=4af96f38000100370000 [10/Nov/2009:08:55:19 -0500] conn=158624 op=73 MOD dn="gtdirguid=af56dc17bc397b8371b9eed7d91b2af2,ou=people,dc=gted,dc=gatech,dc=edu" [10/Nov/2009:08:55:19 -0500] conn=158624 op=73 RESULT err=0 tag=103 nentries=0 etime=0 csn=4af96f38000500370000 [10/Nov/2009:08:55:19 -0500] conn=158624 op=74 MOD dn="gtdirguid=5261a6e61f97bca3f2741b428255daf4,ou=people,dc=gted,dc=gatech,dc=edu" [10/Nov/2009:08:55:19 -0500] conn=158624 op=74 RESULT err=0 tag=103 nentries=0 etime=0 csn=4af96f39000000370000 [10/Nov/2009:08:55:19 -0500] conn=158624 op=75 MOD dn="gtdirguid=ad5dddfb9ec4fe230644dc060ce641c7,ou=people,dc=gted,dc=gatech,dc=edu" [10/Nov/2009:08:55:19 -0500] conn=158624 op=75 RESULT err=0 tag=103 nentries=0 etime=0 csn=4af96f39000100370000 [10/Nov/2009:08:55:19 -0500] conn=158624 op=76 MOD dn="gtdirguid=0a87d4434e42edf9df3019b922fe2b63,ou=people,dc=gted,dc=gatech,dc=edu" [10/Nov/2009:08:55:19 -0500] conn=158624 op=76 RESULT err=0 tag=103 nentries=0 etime=0 csn=4af96f39000200370000 [10/Nov/2009:08:55:20 -0500] conn=158624 op=77 MOD dn="gtdirguid=8be9c47e00957f900b69af162cba8adf,ou=people,dc=gted,dc=gatech,dc=edu" [10/Nov/2009:08:55:20 -0500] conn=158624 op=77 RESULT err=0 tag=103 nentries=0 etime=0 csn=4af96f7b000000370000 [10/Nov/2009:08:55:20 -0500] conn=158624 op=78 MOD dn="gtdirguid=beef247b54a6e1289f3e392865e08e17,ou=people,dc=gted,dc=gatech,dc=edu" [10/Nov/2009:08:55:20 -0500] conn=158624 op=78 RESULT err=0 tag=103 nentries=0 etime=0 csn=4af96f7b000400370000 [10/Nov/2009:08:55:20 -0500] conn=158624 op=79 MOD dn="gtdirguid=db2aa7040d3b982cf1a32aad503d7e6f,ou=people,dc=gted,dc=gatech,dc=edu" [10/Nov/2009:08:55:20 -0500] conn=158624 op=79 RESULT err=0 tag=103 nentries=0 etime=0 csn=4af96f7b000500370000 [10/Nov/2009:08:55:20 -0500] conn=158624 op=80 MOD dn="gtdirguid=c9390deb6018475b4d5d5267ba70fb99,ou=people,dc=gted,dc=gatech,dc=edu" [10/Nov/2009:08:55:20 -0500] conn=158624 op=80 RESULT err=0 tag=103 nentries=0 etime=0 csn=4af96f7b000600370000 [10/Nov/2009:08:55:20 -0500] conn=158624 op=81 EXT oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session" [10/Nov/2009:08:55:20 -0500] conn=158624 op=81 RESULT err=0 tag=120 nentries=0 etime=0 [10/Nov/2009:08:55:24 -0500] conn=158624 op=83 EXT oid="2.16.840.1.113730.3.5.3" name="Netscape Replication Start Session" [10/Nov/2009:08:55:24 -0500] conn=158624 op=83 RESULT err=0 tag=120 nentries=0 etime=0 [10/Nov/2009:08:55:28 -0500] conn=158624 op=84 EXT oid="2.16.840.1.113730.3.5.3" name="Netscape Replication Start Session" [10/Nov/2009:08:55:28 -0500] conn=158624 op=84 RESULT err=0 tag=120 nentries=0 etime=0 [10/Nov/2009:08:56:06 -0500] conn=158624 op=86 EXT oid="2.16.840.1.113730.3.5.3" name="Netscape Replication Start Session" [10/Nov/2009:08:56:06 -0500] conn=158624 op=87 EXT oid="2.16.840.1.113730.3.5.3" name="Netscape Replication Start Session" [10/Nov/2009:08:56:07 -0500] conn=158624 op=87 RESULT err=0 tag=120 nentries=0 etime=1 [10/Nov/2009:08:56:10 -0500] conn=158624 op=88 EXT oid="2.16.840.1.113730.3.5.3" name="Netscape Replication Start Session" [10/Nov/2009:08:56:27 -0500] conn=158624 op=88 RESULT err=0 tag=120 nentries=0 etime=17 [10/Nov/2009:08:56:59 -0500] conn=158624 op=89 EXT oid="2.16.840.1.113730.3.5.3" name="Netscape Replication Start Session" [10/Nov/2009:08:56:59 -0500] conn=158624 op=89 RESULT err=0 tag=120 nentries=0 etime=0
[10/Nov/2009:08:57:59 -0500] conn=158624 op=91 UNBIND
[10/Nov/2009:08:57:59 -0500] conn=158624 op=91 fd=64 closed - U1

You can see that replication was working until around 8:55:20 and then there's a series of "Replication Start Session" operations with no corresponding "End Session". The log messages from the supplier server (which I sent in the previous mail) seem to indicate that some communication problems may have occurred (all those "Unable to receive response" logs).

Digging through the audit-log, I found the following:

time: 20091110085608
dn: cn="ou=people,dc=gted,dc=gatech,dc=edu",cn=mapping tree,cn=config
changetype: modify
replace: nsslapd-referral
nsslapd-referral: ldap://rewbell.iam.gatech.edu:389/ou%3Dpeople%2Cdc%3Dgted%2C
dc%3Dgatech%2Cdc%3Dedu
-
replace: modifiersname
modifiersname: cn=server,cn=plugins,cn=config
-
replace: modifytimestamp
modifytimestamp: 20091110135607Z
-

time: 20091110085610
dn: cn="ou=people,dc=gted,dc=gatech,dc=edu",cn=mapping tree,cn=config
changetype: modify
replace: nsslapd-referral
nsslapd-referral: ldaps://rw.gted.gatech.edu:636/ou=people,dc=gted,dc=gatech,d
c=edu
-
replace: modifiersname
modifiersname: cn=server,cn=plugins,cn=config
-
replace: modifytimestamp
modifytimestamp: 20091110135610Z
-

time: 20091110085625
dn: cn="ou=people,dc=gted,dc=gatech,dc=edu",cn=mapping tree,cn=config
changetype: modify
replace: nsslapd-state
nsslapd-state: referral
-
replace: modifiersname
modifiersname: cn=server,cn=plugins,cn=config
-
replace: modifytimestamp
modifytimestamp: 20091110135625Z
-

time: 20091110085626
dn: cn="ou=people,dc=gted,dc=gatech,dc=edu",cn=mapping tree,cn=config
changetype: modify
replace: nsslapd-state
nsslapd-state: backend
-
replace: modifiersname
modifiersname: cn=server,cn=plugins,cn=config
-
replace: modifytimestamp
modifytimestamp: 20091110135626Z
-

The referral for that suffix first gets changed "ldap://rewbell.iam.gatech.edu:389/...";, which isn't right.

Then it get changed to "ldaps://rw.gted.gatech.edu:636/..." which is correct (and what we have configured in the nsDS5ReplicaReferral attribute).

Then the state gets changed a couple of times too. Not sure if this is a cause or effect of the problem we're seeing, but I'm inclined to think it's an effect of the server thinking it's been told to do an import, not a cause. IMHO, of course.
Yes, I think you are correct. The behavior in the logs on the consumer is consistent with that. When a supplier starts an initialization of a consumer, it sets the consumer's database to referral mode - that is, the consumer should not attempt to process any requests for that database during initialization - it's database is offline while being initialized.

Another odd thing is that op=85 is not logged - not sure what happened there.

It would be useful to take a look at the supplier access and error logs from around 08:55:28 to 08:56:27

--
389 users mailing list
389-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-directory-users

<<attachment: smime.p7s>>

--
389 users mailing list
389-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-directory-users

[Index of Archives]     [Fedora Directory Users]     [Fedora Directory Devel]     [Fedora Announce]     [Fedora Legacy Announce]     [Kernel]     [Fedora Legacy]     [Share Photos]     [Fedora Desktop]     [PAM]     [Red Hat Watch]     [Red Hat Development]     [Big List of Linux Books]     [Gimp]     [Yosemite News]

  Powered by Linux