Re: 389ds doesn't start

Jan Kowalsky <jankow@xxxxxxxxxxxxxxxxxx> · Thu, 13 Dec 2018 21:37:05 +0100

Hi David,

thanks for answer,

Am 13.12.2018 um 20:53 schrieb David Boreham:
> 
> On 12/13/2018 12:30 PM, Jan Kowalsky wrote:
>>
>> after dirsrv crashed and trying to restart, I got the following errors
>> and dirsrv doesn't start at all:
>>
>> [13/Dec/2018:20:17:28 +0100] - 389-Directory/1.3.3.5 B2018.298.1116
>> starting up
>> [13/Dec/2018:20:17:28 +0100] - Detected Disorderly Shutdown last time
>> Directory Server was running, recovering database.
>> [13/Dec/2018:20:17:29 +0100] - libdb: BDB3017 unable to allocate space
>> from the buffer cache
> 
> ^^^^^^^^^^^^
> This looks to be where the train goes off the rails. Everything below is
> just smoke and flames that results.
> 
> Actually I am wondering : why did the process even continue running
> after seeing a fatal error. I think that's a bug. It should have just
> exited at that point?
> 
>> [13/Dec/2018:20:17:29 +0100] - libdb: BDB1521 Recovery function for LSN
>> 6120 6259890 failed
>> [13/Dec/2018:20:17:29 +0100] - libdb: BDB0061 PANIC: Cannot allocate
>> memory
>> [13/Dec/2018:20:17:29 +0100] - libdb: BDB1546 unable to join the
>> environment
>> [13/Dec/2018:20:17:29 +0100] - Database Recovery Process FAILED. The
>> database is not recoverable. err=-30973: BDB0087 DB_RUNRECOVERY: Fatal
>> error, run database recovery
>> [13/Dec/2018:20:17:29 +0100] - Please make sure there is enough disk
>> space for dbcache (400000 bytes) and db region files
>>
>>
>> Any idea what to do?
> 
> First thing to do is to determine if this is a case of a system that
> worked in the past, and now doesn't.

yes. It run for month - and didn't restart today.

> If so, ask what you changed that might have broken it (e.g. config change).
> If this is a new deployment that never worked, then I'd recommend
> running the ns-slapd process under strace to see what syscalls it is
> making, then figure out which one fails that might correspond to the
> "out of memory" condition in userspace.

Well, we just added a new database on runtime which worked fine - 389ds
was still running. After changing a replica I wanted to restart and
resulted in the error.

> Also try turning up the logging verbosity to the max. From memory the

How can I achive this? In dse.ldif I have:

nsslapd-errorlog-level: 32768

> cache sizing code might print out its selected sizes. There may be other
> useful debug output you get. You don't need to look at anything in the
> resulting log after that fatal memory allocation error I cited above.
> 
>>
>> There is plenty of disk-space and 2GB Ram
>>
> Hmm...2G ram is very small fwiw, although obviously bigger than the
> machines we originally ran the DS on in the late 90's.

I increased to 3.5 GB (more I don't have at the moment in the
virtualisation host. But still the same.

> There's always the possibility that something in the cache auto-sizing
> is just wrong for very small memory machines.
> I think it does some grokking of the physical memory size then tries to
> "auto-size" the caches accordingly.
> There may even be some issue where the physical memory size it gets is
> from the VM host, not the VM (so it would be horribly wrong).

I don't assume - since it worked all the time... What I could imagine is
that cangelogdb files had been smaller last reboot - so any memory limit
didn't took effect.

I already have in /etc/dirsrv.systed:

[Service]
# uncomment this line to raise the file descriptor limit
LimitNOFILE=10240
LimitCORE=infinity

Kind regards
Jan
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx