On 03/11/2014 04:09 PM, Timothy Pollard
wrote:
On Tue, 11 Mar 2014 07:17:25 -0600 Rich Megginson <rmeggins@xxxxxxxxxx> wrote:On 03/10/2014 09:17 PM, Timothy Pollard wrote:On Mon, 10 Mar 2014 20:56:08 -0600 Rich Megginson <rmeggins@xxxxxxxxxx> wrote:On 03/10/2014 08:42 PM, Timothy Pollard wrote:A small update; we're nowNow as opposed to some time in the past? At what point did you begin seeing these messages, and what changed?It looks like it started after I manually "fixed" the entry.What exactly did you do to fix the entry?I edited it and filled it what looked like the missing values (which I copied from an old LDIF file): dNSClass: IN zoneName: cvsdude.com relativeDomainName: testingstatus objectClass: top objectClass: dNSZone dNSTTL: 100 Did you use ldapdelete to delete old one and ldapmodify/ldapadd to add this fixed one? As I said it is a test entry, so I'm happy to delete it entirely and recreate it if you think this will fix the issue,I don't think it will fix the issue, but it may help reproduce it more easily.but I can hold off on that if you'd like me to find out more information.If you are not experiencing the "non-contiguous" problem now, there's not much information to get.We're not seeing the non-contiguous problem any more, but we are seeing repeated DB crashes: [11/Mar/2014:21:57:14 +0000] - libdb: dnsRoot/id2entry.db4 page 36132 is on free list with type 5 [11/Mar/2014:21:57:14 +0000] - libdb: PANIC: Invalid argument [11/Mar/2014:21:57:14 +0000] - libdb: PANIC: fatal region error detected; run recovery [11/Mar/2014:21:57:14 +0000] - Serious Error---Failed in dblayer_txn_abort, err=-30974 (DB_RUNRECOVERY: Fatal error, run database recovery) [11/Mar/2014:21:57:14 +0000] - libdb: PANIC: fatal region error detected; run recovery [11/Mar/2014:21:57:14 +0000] - FATAL ERROR at idl_new.c (1); server stopping as database recovery needed. I don't suppose you are running out of disk space? Any other disk errors? Is this a VM with a virtual disk image holding the db? This happens within a few minutes after every restart of the daemon. I'm not sure if this is related though. It (the new DB error) first occurred after ns-slapd was killed by the oom-killer. Could that cause database corruption? It is not supposed to, but it is a possibility. https://access.redhat.com/site/documentation/en-US/Red_Hat_Directory_Server/9.0/html/Performance_Tuning_Guide/index.htmlIt also looks like we might need to do some memory tuning on 389, is there some suggested documentation on that, or should I just google it? is a good place to start At the moment we've switched to our other master (we use a multi-master replication setup), so we'll probably just rebuild the problem server from there, but is there anything that I should look at to diagnose the problem first? I'm not sure. Looks like we are now working on several different problems in various states of knowledge/severity . . . Thanks, |
-- 389 users mailing list 389-users@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/389-users