Re: Non-contiguous attribute values

Rich Megginson <rmeggins@xxxxxxxxxx> · Tue, 11 Mar 2014 16:38:51 -0600

    On 03/11/2014 04:09 PM, Timothy Pollard
      wrote:

      On Tue, 11 Mar 2014 07:17:25 -0600
Rich Megginson <rmeggins@xxxxxxxxxx> wrote:

        On 03/10/2014 09:17 PM, Timothy Pollard wrote:

          On Mon, 10 Mar 2014 20:56:08 -0600
Rich Megginson <rmeggins@xxxxxxxxxx> wrote:

            On 03/10/2014 08:42 PM, Timothy Pollard wrote:

              A small update; we're now

            Now as opposed to some time in the past?  At what point did you begin
seeing these messages, and what changed?

          It looks like it started after I manually "fixed" the entry.

        What exactly did you do to fix the entry?

      I edited it and filled it what looked like the missing values (which I copied
from an old LDIF file):

dNSClass: IN
zoneName: cvsdude.com
relativeDomainName: testingstatus
objectClass: top
objectClass: dNSZone
dNSTTL: 100

    Did you use ldapdelete to delete old one and ldapmodify/ldapadd to
    add this fixed one?

          As I said it is a
test entry, so I'm happy to delete it entirely and recreate it if you think
this will fix the issue,

        I don't think it will fix the issue, but it may help reproduce it more easily.

          but I can hold off on that if you'd like me to find
out more information.

        If you are not experiencing the "non-contiguous" problem now, there's not
much information to get.

      We're not seeing the non-contiguous problem any more, but we are seeing
repeated DB crashes:

[11/Mar/2014:21:57:14 +0000] - libdb: dnsRoot/id2entry.db4 page 36132 is on free list with type 5
[11/Mar/2014:21:57:14 +0000] - libdb: PANIC: Invalid argument
[11/Mar/2014:21:57:14 +0000] - libdb: PANIC: fatal region error detected; run recovery
[11/Mar/2014:21:57:14 +0000] - Serious Error---Failed in dblayer_txn_abort, err=-30974 (DB_RUNRECOVERY: Fatal error, run database recovery)
[11/Mar/2014:21:57:14 +0000] - libdb: PANIC: fatal region error detected; run recovery
[11/Mar/2014:21:57:14 +0000] - FATAL ERROR at idl_new.c (1); server stopping as database recovery needed.

    I don't suppose you are running out of disk space?  Any other disk
    errors?  Is this a VM with a virtual disk image holding the db?

This happens within a few minutes after every restart of the daemon. I'm not
sure if this is related though. It (the new DB error) first occurred after
ns-slapd was killed by the oom-killer. Could that cause database corruption?

    It is not supposed to, but it is a possibility.

It also looks like we might need to do some memory tuning on 389, is there some
suggested documentation on that, or should I just google it?

https://access.redhat.com/site/documentation/en-US/Red_Hat_Directory_Server/9.0/html/Performance_Tuning_Guide/index.html

    is a good place to start

At the moment we've switched to our other master (we use a multi-master
replication setup), so we'll probably just rebuild the problem server from
there, but is there anything that I should look at to diagnose the problem first?

    I'm not sure.  Looks like we are now working on several different
    problems in various states of knowledge/severity . . .

Thanks,

      --
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users

--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users