Re: [389-users] incorrect DNs sometimes returned on searches: 1.2.6 and 1.2.6.1

Noriko Hosoi <nhosoi@xxxxxxxxxx> · Thu, 14 Oct 2010 15:10:28 -0700

  Eric,

Thank you for your response.  Just to make sure your db is not broken, 
could you run these command lines and look for any corrupted DIT link 
when the DN corruption is observed?  The outputs should be huge.  So, I 
recommend you to redirect them to a file.  I think we are interested in 
just around "ou=People,dc=acs,dc=albany,dc=edu" and 
"ou=Group,dc=acs,dc=albany,dc=edu".  Since restarting the server fixes 
the problem, (I'm hoping) you don't see any corruption in this level.

$ dbscan -f /var/lib/dirsrv/slapd-YOURID/db/YOURBACKEND/id2entry.db4 | 
egrep "dn:|entryid:|parentid:"
     rdn: dc=acs,dc=albany,dc=edu
     entryid: 1
     rdn: ou=People
     parentid: 1
     entryid: 2
     [...]

$ dbscan -f /var/lib/dirsrv/slapd-YOURID/db/YOURBACKEND/entryrdn.db4 -k 
"ou=People,dc=acs,dc=albany,dc=edu"
ou=People,dc=acs,dc=albany,dc=edu
   ID: #; RDN: "ou=People,dc=acs,dc=albany,dc=edu"; NRDN: 
"ou=people,dc=acs,dc=albany,dc=edu"
[...]

$ dbscan -f /var/lib/dirsrv/slapd-YOURID/db/YOURBACKEND/entryrdn.db4 -k 
"ou=Group,dc=acs,dc=albany,dc=edu"
ou=Group,dc=acs,dc=albany,dc=edu
   ID: #; RDN: "ou=Group,dc=acs,dc=albany,dc=edu"; NRDN: 
"ou=group,dc=acs,dc=albany,dc=edu"
[...]

Thanks!
--noriko

Eric Torgersen wrote:
> Noriko,
>
>   Please see my comments below.
>
> On Thu, 14 Oct 2010, Noriko Hosoi wrote:
>
>>    Eric,
>>
>> Thanks for your input.  It contains lots of useful information.  Can I
>> ask some more details about this section?  The corrupted DN problem is
>> observed only on a replica after a consumer initialization is done?  Or
>> it is observed on the master as well?
> It is mainly observed on the master.  I think I only observed it on the
> replica because I happened to be doing an initialization at a time when
> the master had some of the corrupted DNs in memory.
>
> On the master, the corrupted DNs can be cleared by a restart - they seem
> to be in memory only.  To fix the replica, I had to reinitialize again
> after restarting the master (because the entries with corrupt DNs were
> written to disk.)  I think the source of the error on the replica was just
> that it was passed bad information from the master.
>
>> When the incorrect DNs are
>> detected in the consumer initialization, it is rejected due to the
>> invalid DN or just passed through?
> Many were just passed though because they were actually valid, but
> incorrect DNs, in the case where the ou=Group part was dropped from the
> DN:
>
> dn: cn=mongrp,dc=acs,dc=albany,dc=edu
>
> A few were rejected because the DN was invalid, like in the case of
> wwwcarey,=albany,,dc=acs,dc=albany,dc=edu
>
>> Were the events logged in the error
>> log?
> For the rejected DNs, yes:
>
> [14/Oct/2010:10:35:35 -0400] NSMMReplicationPlugin -
> multimaster_be_state_change
> : replica dc=acs,dc=albany,dc=edu is going offline; disabling replication
> [14/Oct/2010:10:35:35 -0400] - WARNING: Import is running with
> nsslapd-db-privat
> e-import-mem on; No other process is allowed to access the database
> [14/Oct/2010:10:35:56 -0400] - import userRoot: Processed 50368 entries --
> avera
> ge rate 2398.5/sec, recent rate 2398.4/sec, hit ratio 0%
> [14/Oct/2010:10:36:00 -0400] - import userRoot: WARNING: Skipping entry
> "cn=wwwc
> hsr,ou=Group,ou=Group,dc=acs,dc=albany,dc=edu" which has no parent, ending
> at line 0 of file "(bulk import)"
> [14/Oct/2010:10:36:01 -0400] - import userRoot: WARNING: bad entry: ID
> 57588
> ...
> [14/Oct/2010:10:36:08 -0400] - import userRoot: WARNING: Skipping entry
> "cn=wwwtmu,ou=Group,=albany,dc=edu,dc=acs,dc=albany,dc=edu" which has no
> parent, ending at line 0 of file "(bulk import)"
> [14/Oct/2010:10:36:09 -0400] - import userRoot: WARNING: bad entry: ID
> 72233
> ...
> [14/Oct/2010:10:36:39 -0400] - import userRoot: Processed 107643 entries
> -- aver
> age rate 1708.6/sec, recent rate 1363.7/sec, hit ratio 95%
> [14/Oct/2010:10:36:47 -0400] - import userRoot: WARNING: Skipping entry
> "cn=wwwc
> arey,=albany,,dc=acs,dc=albany,dc=edu" which has no parent, ending at line
> 0 of
> file "(bulk import)"
> [14/Oct/2010:10:36:48 -0400] - import userRoot: WARNING: bad entry: ID
> 116490
>
>
>> Did you have a chance to search the entry having the corrupted DN
>> (corrupted and original one) on the master then?
> Yes.  The same DNs showed up corrupted on the master, until I restarted
> it.  Then they appeared fine.
>
> So far, the corrupted DNs seem to be happening less frequently with
> 1.2.6.1, as compared to 1.2.6.  We have been on 1.2.6.1 since yesterday
> evening, and only had this happen once so far with some of the group
> entries.  On 1.2.6, this was usually happening multiple times per day, and
> affecting user entries.
>
> Thanks,
> Eric
> --
> 389 users mailing list
> 389-users@xxxxxxxxxxxxxxxxxxxxxxx
> https://admin.fedoraproject.org/mailman/listinfo/389-users

--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users