On Tue, 11 Mar 2014 07:17:25 -0600 Rich Megginson <rmeggins@xxxxxxxxxx> wrote: > On 03/10/2014 09:17 PM, Timothy Pollard wrote: > > On Mon, 10 Mar 2014 20:56:08 -0600 > > Rich Megginson <rmeggins@xxxxxxxxxx> wrote: > >> On 03/10/2014 08:42 PM, Timothy Pollard wrote: > >>> A small update; we're now > >> Now as opposed to some time in the past? At what point did you begin > >> seeing these messages, and what changed? > > It looks like it started after I manually "fixed" the entry. > > What exactly did you do to fix the entry? I edited it and filled it what looked like the missing values (which I copied from an old LDIF file): dNSClass: IN zoneName: cvsdude.com relativeDomainName: testingstatus objectClass: top objectClass: dNSZone dNSTTL: 100 > > > As I said it is a > > test entry, so I'm happy to delete it entirely and recreate it if you think > > this will fix the issue, > > I don't think it will fix the issue, but it may help reproduce it more easily. > > > but I can hold off on that if you'd like me to find > > out more information. > > If you are not experiencing the "non-contiguous" problem now, there's not > much information to get. > We're not seeing the non-contiguous problem any more, but we are seeing repeated DB crashes: [11/Mar/2014:21:57:14 +0000] - libdb: dnsRoot/id2entry.db4 page 36132 is on free list with type 5 [11/Mar/2014:21:57:14 +0000] - libdb: PANIC: Invalid argument [11/Mar/2014:21:57:14 +0000] - libdb: PANIC: fatal region error detected; run recovery [11/Mar/2014:21:57:14 +0000] - Serious Error---Failed in dblayer_txn_abort, err=-30974 (DB_RUNRECOVERY: Fatal error, run database recovery) [11/Mar/2014:21:57:14 +0000] - libdb: PANIC: fatal region error detected; run recovery [11/Mar/2014:21:57:14 +0000] - FATAL ERROR at idl_new.c (1); server stopping as database recovery needed. This happens within a few minutes after every restart of the daemon. I'm not sure if this is related though. It (the new DB error) first occurred after ns-slapd was killed by the oom-killer. Could that cause database corruption? It also looks like we might need to do some memory tuning on 389, is there some suggested documentation on that, or should I just google it? At the moment we've switched to our other master (we use a multi-master replication setup), so we'll probably just rebuild the problem server from there, but is there anything that I should look at to diagnose the problem first? Thanks, -- TimP [http://blog.timp.com.au] [http://resume.timp.com.au]
Attachment:
signature.asc
Description: PGP signature
-- 389 users mailing list 389-users@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/389-users