Re: DS crashed /killed by OS

German Parente <gparente@xxxxxxxxxx> · Fri, 23 Oct 2015 03:01:03 -0400 (EDT)

Hi Trevor,

good to know it's working fine. Thanks for your feedback.

Regards,

German.

----- Original Message -----
> From: "Trevor Fong" <trevor.fong@xxxxxx>
> To: "General discussion list for the 389 Directory server project." <389-users@xxxxxxxxxxxxxxxxxxxxxxx>
> Sent: Thursday, October 22, 2015 7:48:48 PM
> Subject: Re:  DS crashed /killed by OS
> 
> Hi German,
> 
> Thanks for your suggestion.  I’m happy to confirm that setting userRoot’s
> nsslapd-cachememsize: 429496730 (1/15th of previous value of 6 GB) has
> addressed the memory issue for now, and % Mem for the ns-slapd process seems
> to be at a manageable level.
> 
> Thanks very much,
> Trev
> 
> 
> 
> 
> On 2015-10-20, 11:07 AM, "389-users-bounces@xxxxxxxxxxxxxxxxxxxxxxx on behalf
> of German Parente" <389-users-bounces@xxxxxxxxxxxxxxxxxxxxxxx on behalf of
> gparente@xxxxxxxxxx> wrote:
> 
> >
> >Hi Trevor,
> >
> >400Mb could be a more reasonable value. With a cache of 6gb, fragmentation
> >could very quickly provoke the OOM killer error.
> >
> >Regards,
> >
> >German.
> >
> >----- Original Message -----
> >> From: "Trevor Fong" <trevor.fong@xxxxxx>
> >> To: "General discussion list for the 389 Directory server project."
> >> <389-users@xxxxxxxxxxxxxxxxxxxxxxx>
> >> Sent: Tuesday, October 20, 2015 7:44:06 PM
> >> Subject: Re:  DS crashed /killed by OS
> >> 
> >> Hi German,
> >> 
> >> Thanks very much for your reply.
> >> Just to make sure I have it straight, I’ve currently got userRoot’s
> >> nsslapd-cachememsize = 6 GB on at 16GB machine.
> >> I should change that to nsslapd-cachememsize = 6 GB / 15 = 429496730
> >> Do I have that right?
> >> 
> >> Thanks again,
> >> Trev
> >> 
> >> 
> >> 
> >> 
> >> On 2015-10-20, 10:23 AM, "389-users-bounces@xxxxxxxxxxxxxxxxxxxxxxx on
> >> behalf
> >> of German Parente" <389-users-bounces@xxxxxxxxxxxxxxxxxxxxxxx on behalf of
> >> gparente@xxxxxxxxxx> wrote:
> >> 
> >> >Hi Trevor,
> >> >
> >> >no problem. In fact, this issue has been investigated by the experts and
> >> >it's due to fragmentation. A fix is being tested right internally but not
> >> >delivered yet, to use a different allocator.
> >> >
> >> >The official workaround is different to the one I have proposed. It's
> >> >finally to define entry cache rather small since the fragmentation could
> >> >be
> >> >like
> >> >
> >> >15 * size of entry cache.
> >> >
> >> >So, we need something like (15 * size of entry cache )  <  Available
> >> >memory.
> >> >
> >> >Thanks and regards,
> >> >
> >> >German.
> >> >
> >> >
> >> >
> >> >----- Original Message -----
> >> >> From: "Trevor Fong" <trevor.fong@xxxxxx>
> >> >> To: "General discussion list for the 389 Directory server project."
> >> >> <389-users@xxxxxxxxxxxxxxxxxxxxxxx>
> >> >> Sent: Tuesday, October 20, 2015 7:09:46 PM
> >> >> Subject: Re:  DS crashed /killed by OS
> >> >> 
> >> >> Hi German,
> >> >> 
> >> >> Apologies for resurrecting an old thread.
> >> >> We're also experiencing something similar.  We're currently running
> >> >> 389-ds-base-1.2.11.15-48.el6_6.x86_64
> >> >> 
> >> >> I'm afraid I don't have login privileges in order to view the details
> >> >> of
> >> >> the
> >> >> bug you linked.
> >> >> Could you please post details of how you defined an entry cache to
> >> >> include
> >> >> the whole db, and why this works?
> >> >> 
> >> >> FYI - moves are afoot re upgrading DS on a set of new servers, but in
> >> >> the
> >> >> meantime, we need to address this issue.
> >> >> 
> >> >> 
> >> >> Thanks a lot,
> >> >> Trev
> >> >> 
> >> >> 
> >> >> 
> >> >> 
> >> >> 
> >> >> On 2015-02-05, 1:57 AM, "389-users-bounces@xxxxxxxxxxxxxxxxxxxxxxx on
> >> >> behalf
> >> >> of German Parente" <389-users-bounces@xxxxxxxxxxxxxxxxxxxxxxx on behalf
> >> >> of
> >> >> gparente@xxxxxxxxxx> wrote:
> >> >> 
> >> >> >
> >> >> >Hi,
> >> >> >
> >> >> >we have had several customer cases showing this behavior. In one of
> >> >> >these
> >> >> >cases, we have confirmed it was due to memory fragmentation after
> >> >> >cache-trashing.
> >> >> >
> >> >> >We have stopped seeing this behavior by defining an entry cache which
> >> >> >includes the whole db (when possible, of course).
> >> >> >
> >> >> >Details can be found at:
> >> >> >
> >> >> >https://bugzilla.redhat.com/show_bug.cgi?id=1186512
> >> >> >Apparent memory leak in ns-slapd; OOM-Killer invoked
> >> >> >
> >> >> >Regards,
> >> >> >
> >> >> >German
> >> >> >
> >> >> >----- Original Message -----
> >> >> >> From: "David Boreham" <david_list@xxxxxxxxxxx>
> >> >> >> To: 389-users@xxxxxxxxxxxxxxxxxxxxxxx
> >> >> >> Sent: Wednesday, February 4, 2015 8:50:55 PM
> >> >> >> Subject: Re:  DS crashed /killed by OS
> >> >> >> 
> >> >> >> On 2/4/2015 11:20 AM, ghiureai wrote:
> >> >> >> 
> >> >> >> 
> >> >> >> 
> >> >> >> Out of memory: Kill process 2090 (ns-slapd) score 954 or sacrifice
> >> >> >> child
> >> >> >> 
> >> >> >> It wasn't clear to me from your post whether you already have a good
> >> >> >> understanding of the OOM killer behavior in the kernel.
> >> >> >> On the chance that you're not yet familiar with its ways, suggest
> >> >> >> reading,
> >> >> >> for example this article :
> >> >> >> http://unix.stackexchange.com/questions/153585/how-oom-killer-decides-which-process-to-kill-first
> >> >> >> I mention this because it may not be the DS that is the problem (not
> >> >> >> saying
> >> >> >> that it absolutely is not, but it might not be).
> >> >> >> The OMM killer picks a process that is using a large amount of
> >> >> >> memory,
> >> >> >> and
> >> >> >> kills it in order to preserve system stability.
> >> >> >> This does not necessarily imply that the process it kills is the
> >> >> >> process
> >> >> >> that
> >> >> >> is causing the system to run out of memory.
> >> >> >> You said that the DS "crashed", but in fact the kernel killed it --
> >> >> >> not
> >> >> >> quite
> >> >> >> the same thing!
> >> >> >> 
> >> >> >> It is also possible that the system has insufficient memory for the
> >> >> >> processes
> >> >> >> it is running, DS cache size and so on.
> >> >> >> Certainly it is worthwhile checking that the DS hasn't been
> >> >> >> inadvertently
> >> >> >> configured to use more peak memory than the machine has available.
> >> >> >> 
> >> >> >> Bottom line : there are a few potential explanations, including but
> >> >> >> not
> >> >> >> limited to a memory leak in the DS process.
> >> >> >> Some analysis will be needed to identify the cause. As a precaution,
> >> >> >> if
> >> >> >> you
> >> >> >> can -- configure more swap space on the box.
> >> >> >> This will allow more runway before the kernel starts looking for
> >> >> >> processes
> >> >> >> to
> >> >> >> kill, and hence more time to figure out what's using memory and why.
> >> >> >> 
> >> >> >> 
> >> >> >> 
> >> >> >> 
> >> >> >> 
> >> >> >> 
> >> >> >> --
> >> >> >> 389 users mailing list
> >> >> >> 389-users@xxxxxxxxxxxxxxxxxxxxxxx
> >> >> >> https://admin.fedoraproject.org/mailman/listinfo/389-users
> >> >> >--
> >> >> >389 users mailing list
> >> >> >389-users@xxxxxxxxxxxxxxxxxxxxxxx
> >> >> >https://admin.fedoraproject.org/mailman/listinfo/389-users
> >> >> --
> >> >> 389 users mailing list
> >> >> 389-users@xxxxxxxxxxxxxxxxxxxxxxx
> >> >> https://admin.fedoraproject.org/mailman/listinfo/389-users
> >> >--
> >> >389 users mailing list
> >> >389-users@xxxxxxxxxxxxxxxxxxxxxxx
> >> >https://admin.fedoraproject.org/mailman/listinfo/389-users
> >> --
> >> 389 users mailing list
> >> 389-users@xxxxxxxxxxxxxxxxxxxxxxx
> >> https://admin.fedoraproject.org/mailman/listinfo/389-users
> >--
> >389 users mailing list
> >389-users@xxxxxxxxxxxxxxxxxxxxxxx
> >https://admin.fedoraproject.org/mailman/listinfo/389-users
> --
> 389 users mailing list
> 389-users@xxxxxxxxxxxxxxxxxxxxxxx
> https://admin.fedoraproject.org/mailman/listinfo/389-users
--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users