On 11/16/2016 01:23 PM, William Brown wrote:
What's your ioblocktimeout set to?
nsslapd-ioblocktimeout: 1800000
How many connections are idle on the server?
How would I check?
Are you seeing OOM behaviour or memory not being released to the OS?
No, the systems use very little memory:
# free
total used free shared buff/cache
available
Mem: 1883872 148932 72752 97156 1662188 1429468
Swap: 2097148 65064 2032084
No OOM actions are recorded.
What specs are your servers ie cpu and memory, is it ecc memory?
These are virtual machines with 4 allocated cores and 2GB of RAM. The
host systems are Intel(R) Xeon(R) CPU E5-2620 v3 with 64 of ECC RAM.
The two VMs running 389-ds are on different physical hosts, but have the
same problems at roughly the same frequency, at roughly the same uptime.
What kind of disk are they on? Are there issues in dmesg?
One physical system has a RAID10 mdraid array of SAS disks. The other
has a RAID1 mdraid array of SAS disks. No errors have been recorded.
The virtual machines are LVM-backed with standard (not sparse) LVs.
Have you configured system activity reporter (sar), and have out from
the same time of disk io, memory usage, cpu etc?
I believe that's set up by default, yes.
https://paste.fedoraproject.org/483468/93401501/
The DS stopped responding at about 12:30AM in this readout (system time
is in UTC).
What's your sysctl setup like?
Standard for a CentOS 7 system, with these additions:
net.ipv4.ip_local_port_range = 1024 65000
net.ipv4.tcp_keepalive_time = 600
Have you increased file descriptors for Directory Server?
I thought I had, but it looks like I haven't:
# cat /proc/sys/fs/file-max
185059
# grep nofile /etc/security/limits.conf
# - nofile - max number of open file descriptors
# grep ulimit /etc/profile
nsslapd-maxdescriptors: 1024
Have you lowered the TCP close wait timeout?
No.
When I hear of problems like this, I'm always inclined to investigate
the host first, as there is a surprising amount that can affect DS from
the host.
I suspect so, too, since the problem correlates with the system uptime,
not how long the daemon has been running. But beyond that I'm not sure
how to track this down further.
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx