Re: Non-contiguous attribute values

Timothy Pollard <timp@xxxxxxxxxxx> · Wed, 12 Mar 2014 08:09:59 +1000

On Tue, 11 Mar 2014 07:17:25 -0600
Rich Megginson <rmeggins@xxxxxxxxxx> wrote:
> On 03/10/2014 09:17 PM, Timothy Pollard wrote:
> > On Mon, 10 Mar 2014 20:56:08 -0600
> > Rich Megginson <rmeggins@xxxxxxxxxx> wrote:
> >> On 03/10/2014 08:42 PM, Timothy Pollard wrote:
> >>> A small update; we're now
> >> Now as opposed to some time in the past?  At what point did you begin
> >> seeing these messages, and what changed?
> > It looks like it started after I manually "fixed" the entry.
> 
> What exactly did you do to fix the entry?

I edited it and filled it what looked like the missing values (which I copied
from an old LDIF file):

dNSClass: IN
zoneName: cvsdude.com
relativeDomainName: testingstatus
objectClass: top
objectClass: dNSZone
dNSTTL: 100

> 
> > As I said it is a
> > test entry, so I'm happy to delete it entirely and recreate it if you think
> > this will fix the issue,
> 
> I don't think it will fix the issue, but it may help reproduce it more easily.
> 
> > but I can hold off on that if you'd like me to find
> > out more information.
> 
> If you are not experiencing the "non-contiguous" problem now, there's not
> much information to get.
> 

We're not seeing the non-contiguous problem any more, but we are seeing
repeated DB crashes:

[11/Mar/2014:21:57:14 +0000] - libdb: dnsRoot/id2entry.db4 page 36132 is on free list with type 5
[11/Mar/2014:21:57:14 +0000] - libdb: PANIC: Invalid argument
[11/Mar/2014:21:57:14 +0000] - libdb: PANIC: fatal region error detected; run recovery
[11/Mar/2014:21:57:14 +0000] - Serious Error---Failed in dblayer_txn_abort, err=-30974 (DB_RUNRECOVERY: Fatal error, run database recovery)
[11/Mar/2014:21:57:14 +0000] - libdb: PANIC: fatal region error detected; run recovery
[11/Mar/2014:21:57:14 +0000] - FATAL ERROR at idl_new.c (1); server stopping as database recovery needed.

This happens within a few minutes after every restart of the daemon. I'm not
sure if this is related though. It (the new DB error) first occurred after
ns-slapd was killed by the oom-killer. Could that cause database corruption?

It also looks like we might need to do some memory tuning on 389, is there some
suggested documentation on that, or should I just google it?

At the moment we've switched to our other master (we use a multi-master
replication setup), so we'll probably just rebuild the problem server from
there, but is there anything that I should look at to diagnose the problem first?

Thanks,
-- 
TimP
[http://blog.timp.com.au]
[http://resume.timp.com.au]
Attachment:
signature.asc

Description: PGP signature
--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users