Re: Searches Hang - Apparently entryrdn index related

Timothy Pollard <timp@xxxxxxxxxxx> · Thu, 13 Feb 2014 08:06:31 +1000

On Wed, 12 Feb 2014 09:22:19 -0800
Noriko Hosoi <nhosoi@xxxxxxxxxx> wrote:
> Rich Megginson wrote:
> > On 02/11/2014 10:32 PM, Timothy Pollard wrote:
> >>
> >> Our system:
> >> $ cat /etc/redhat-release
> >> CentOS release 6.5 (Final)
> >> $ uname -r
> >> 2.6.32-431.3.1.el6.x86_64
> >> $ rpm -q 389-ds
> >> 389-ds-1.2.2-1.el6.noarch
> > rpm -q 389-ds-base
> >
> > Are these servers running in VMs?

No they're physical servers.
> >
> >> In case they're helpful I have stack traces from during both failed and
> >> successful searches. I can send them through if they are useful.
> >
> > Yes.
> >> Does anyone have any idea what might be causing this, or how we could go
> >> about fixing it? Should we report it as a bug?
> >
> > Yes. https://fedorahosted.org/389/newticket
> >
> > You can attach scripts, logs, stack traces, etc. to the ticket.

OK, I've created a new ticket and attached the stacktraces. Thanks.

https://fedorahosted.org/389/ticket/47696

> Can we also have the output from "dbscan
> -f /var/lib/dirsrv/slapd-ldap-04/db/userRoot/entryrdn.db4"  when the problem
> occurs?

Oddly it didn't get broken over night, but I did copy the entryrdn.db4 file
last time it broken, so I can take dbscans of the backup of the broken one and
of the currently working one. Unfortunately both files are over 140MB, and the
best compression I can manage gets them down to 16MB each, which is too big to
attach to the ticket.

Any suggestions on how to best share such large files?
> 
> And, reindexing entryrdn is necessary for the temporary recovery? For
> instance, just restarting the server does not help?  I'm wondering whether
> the dncache in memory is corrupted or the entryrdn index itself is ...

I'm pretty sure it is; I haven't specifically tested that, but I have restarted
it while it was broken to try to resolve other issues as well.
> 
> Thanks,
> --noriko

I also forgot to mention a potentially related error message. Occasionally on
entry deletion we see errors like this in our error log:

[12/Feb/2014:20:20:46 +0000] entryrdn-index - _entryrdn_delete_key: Failed to
remove ou=test; has children
[12/Feb/2014:20:20:46 +0000] - database index operation failed BAD 1031, err=-1
Unknown error: -1

Since they mention the entryrdn and indexes I thought they might be related,
but they don't seem to directly cause this problem, since I'm seeing these
errors now, but the system is fine, but I've had it get broken before with none
of these errors between the last re-index and the failure.

-- 
TimP
[http://blog.timp.com.au]
Attachment:
signature.asc

Description: PGP signature
--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users