On 12/08/2013 10:24 PM, Colin Panisset wrote:
Hi Rich, apologies for the delay in replying, I've been out of the
office for a couple of days.
On 12/04/2013 08:31 AM, Rich Megginson wrote:
On 12/02/2013 06:42 PM, Colin Panisset wrote:
I have a 4-way multi-master replication configuration; the servers are
slightly different versions, as below:
A - 1.2.9.9-1.el5 (CentOS 5)
B - 1.2.9.9-1.el5 (CentOS 5)
C - 1.2.10.2-20.el6_3 (CentOS 6)
D - 1.2.11.15-22.el6_4 (CentOS 6)
D was recently brought into the configuration to replace A (ultimately).
I initialized D as a consumer directly from A, and I've confirmed that
replication proceeds throughout the mesh without apparent incident --
there are no errors in /var/log/dirsrv/slapd*/errors relating to
replication.
The problem is that *some* objects under ou=people,dc=foo,dc=bar on D do
not show some objectclasses, notably "person" and
"organizationalPerson". These values don't show up in the output of
ldapsearch, via the console, or when used by an internal search process,
such as populating an nsFilteredRole.
Is this the blocker problem, that filtered roles are not working? Just
trying to gauge the severity.
I was able to work around the problem by changing the filter on those
roles that weren't working -- so it's not a major issue for us now.
However, I'm still trying to work out why it happened.
What is the full objectclass chain for these entries? That is, are
these "inetOrgPerson" entries that are missing the intermediate parent
object classes "person" and "organizationalPerson"?
Yes. The objectclass chain for one example user looks like this:
nscpEntryWSI:
objectClass;adcsn-5296e4e8000064a40000;vucsn-5296e4e8000064a40000: top
nscpEntryWSI: objectClass;vucsn-5296e4e8000064a40000: inetOrgPerson
nscpEntryWSI: objectClass;vucsn-5296e4e8000064a40000: posixAccount
nscpEntryWSI: objectClass;vucsn-5296e66c000164a40000: ntUser
nscpEntryWSI: objectClass;vucsn-5296e48b000064a40000;deleted:
organizationalPerson
nscpEntryWSI: objectClass;vucsn-5296e48b000064a40000;deleted: person
I note the 'deleted' flag on the 'person' and 'organizationalPerson' values.
That's very interesting. That is only supposed to happen when you
explicitly delete those values. I don't suppose you can find that
operation (CSN 5296e48b000064a40000) in the changelog on any of your
servers?
I checked the other users in the same boat, and for *some*, it's that
the intermediate parents are marked as 'deleted', and on others they
simply don't exist; for example:
nscpEntryWSI:
objectClass;adcsn-5276ebaf000064a40001;vucsn-5276ebaf000064a40001: top
nscpEntryWSI: objectClass;vucsn-5276ebaf000064a40001: inetOrgPerson
nscpEntryWSI: objectClass;vucsn-5276ebaf000064a40001: posixAccount
nscpEntryWSI: objectClass;vucsn-5276ec06000064a40000: inetUser
nscpEntryWSI: objectClass;vucsn-5276ee0d000064a40000: ntUser
What you can do is work backwards from these entries that are missing
these objectclasses.
1) do an ldapsearch like this to get the replication state information:
ldapsearch -xLLL -D "cn=directory manager" -W -b dc=foo,dc=bar
uid=myuser nscpEntryWSI
among the data will be the first CSN for the entry. The CSN looks like
this
TTTTTTTTSSSSRRRRUUUU
Where TTTTTTTT is the 8 hex bytes of the timestamp, SSSS is a sequence
number, and RRRR is the replica ID of the server on which the entry
originated. The RRRR is in hex.
Ok, from the above I get that the replica ID is 64a4, which is 25764 --
I know which replica of ours this is (we have an internally-meaningful
numbering scheme)
Next, go to the server on which the entry originated. Do this:
dbscan -f /var/lib/dirsrv/slapd-*/cldb/*.db4 -k $theoriginalCSN
The server is supposed to "fill in" the missing parent object classes
during the original add request, but perhaps not during a replicated add.
Unfortunately when I got back to the originating server, the changelog
holding the create event had expired so I wasn't able to see the
original entry. Instead, I added a user (via ldapadd on the
command-line) and traced the replication via that method, but only
specifying the "leaf" classes; the intermediate classes *were* properly
filled in and were replicated correctly.
At this point, I'm not sure what I can do to re-replicate the issue.
I'll keep an eye out for it again, and I'm very happy to assist in
further triage if you have any ideas, but from where I sit right now it
appears the trail has gone cold.
Thanks for the hints in tracking it further.
-- C.
--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users