Re: Non-contiguous attribute values

Timothy Pollard <timp@xxxxxxxxxxx> · Wed, 12 Mar 2014 09:42:39 +1000

On Tue, 11 Mar 2014 16:38:51 -0600
Rich Megginson <rmeggins@xxxxxxxxxx> wrote:
> On 03/11/2014 04:09 PM, Timothy Pollard wrote:
> > On Tue, 11 Mar 2014 07:17:25 -0600
> > Rich Megginson <rmeggins@xxxxxxxxxx> wrote:
> >> On 03/10/2014 09:17 PM, Timothy Pollard wrote:
> >>> On Mon, 10 Mar 2014 20:56:08 -0600
> >>> Rich Megginson <rmeggins@xxxxxxxxxx> wrote:
> >>>> On 03/10/2014 08:42 PM, Timothy Pollard wrote:
> >>>>> A small update; we're now
> >>>> Now as opposed to some time in the past?  At what point did you begin
> >>>> seeing these messages, and what changed?
> >>> It looks like it started after I manually "fixed" the entry.
> >> What exactly did you do to fix the entry?
> > I edited it and filled it what looked like the missing values (which I
> > copied from an old LDIF file):
> >
> > dNSClass: IN
> > zoneName: cvsdude.com
> > relativeDomainName: testingstatus
> > objectClass: top
> > objectClass: dNSZone
> > dNSTTL: 100
> 
> Did you use ldapdelete to delete old one and ldapmodify/ldapadd to add this
> fixed one?

I actually used ldapvi, which backs onto ldapmodify, it won't have deleted it,
just modified it.
> 
> >
> >>> As I said it is a
> >>> test entry, so I'm happy to delete it entirely and recreate it if you
> >>> think this will fix the issue,
> >> I don't think it will fix the issue, but it may help reproduce it more
> >> easily.
> >>
> >>> but I can hold off on that if you'd like me to find
> >>> out more information.
> >> If you are not experiencing the "non-contiguous" problem now, there's not
> >> much information to get.
> >>
> > We're not seeing the non-contiguous problem any more, but we are seeing
> > repeated DB crashes:
> >
> > [11/Mar/2014:21:57:14 +0000] - libdb: dnsRoot/id2entry.db4 page 36132 is on
> > free list with type 5 [11/Mar/2014:21:57:14 +0000] - libdb: PANIC: Invalid
> > argument [11/Mar/2014:21:57:14 +0000] - libdb: PANIC: fatal region error
> > detected; run recovery [11/Mar/2014:21:57:14 +0000] - Serious
> > Error---Failed in dblayer_txn_abort, err=-30974 (DB_RUNRECOVERY: Fatal
> > error, run database recovery) [11/Mar/2014:21:57:14 +0000] - libdb: PANIC:
> > fatal region error detected; run recovery [11/Mar/2014:21:57:14 +0000] -
> > FATAL ERROR at idl_new.c (1); server stopping as database recovery needed.
> 
> I don't suppose you are running out of disk space?  Any other disk errors?
> Is this a VM with a virtual disk image holding the db?

We have plenty of disk space, and haven't seen any other disk issues, and can't
find any obvious entries in dmesg or /var/log/messages.
> 
> >
> > This happens within a few minutes after every restart of the daemon. I'm not
> > sure if this is related though. It (the new DB error) first occurred after
> > ns-slapd was killed by the oom-killer. Could that cause database corruption?
> 
> It is not supposed to, but it is a possibility.
> 
> >
> > It also looks like we might need to do some memory tuning on 389, is there
> > some suggested documentation on that, or should I just google it?
> https://access.redhat.com/site/documentation/en-US/Red_Hat_Directory_Server/9.0/html/Performance_Tuning_Guide/index.html
> is a good place to start

OK, thanks.
> >
> > At the moment we've switched to our other master (we use a multi-master
> > replication setup), so we'll probably just rebuild the problem server from
> > there, but is there anything that I should look at to diagnose the problem
> > first?
> 
> I'm not sure.  Looks like we are now working on several different problems in
> various states of knowledge/severity . . .
> 

Yeah, that's the problem with our system, we can't really tell if it's one
problem with many symptoms, or multiple different problems. I think we might be
going to need to get in an LDAP consultant.

Thanks for your help; and anything else you can point me at to try would be
much appreciated.

-- 
TimP
[http://blog.timp.com.au]
[http://resume.timp.com.au]
Attachment:
signature.asc

Description: PGP signature
--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users