On 11/15/2016 12:58 PM, Marc Sauton wrote:
What is the test filter like?
Can we see a sanitized sample of the access log with the SRCH and RESULT?
If using SSL, review the output of
cat /proc/sys/kernel/random/entropy_avail
Do we have replication? (and large attribute values?)
You may want to run the "dbmon.sh" script to monitor cache usage for
db cache and entry cache, try to capture a few samples of line about
dbcachefree and userroot:ent (if the db with the problems is
userroot), when the searches are becoming too long, like this example:
INCR=1 HOST=m2.example.com <http://m2.example.com>
BINDDN="cn=directory manager" BINDPW="password" VERBOSE=2
/usr/sbin/dbmon.sh
and review the ns-slapd errors and system messages log files for any
unusual activity.
what is the ns-slapd memory foot print from restart to slow responses?
any "too high" disk i/o? (or "bad" ssd?)
It is also useful to get a few stacktraces which will give us detailed
information about what the server is doing. For example, if you can
"catch" the server while it is misbehaving, and get stacktraces every
second for 10 seconds.
http://www.port389.org/docs/389ds/FAQ/faq.html#debugging-hangs
Thanks,
M.
On Tue, Nov 15, 2016 at 11:40 AM, Gordon Messmer
<gordon.messmer@xxxxxxxxx <mailto:gordon.messmer@xxxxxxxxx>> wrote:
I'm trying to track down a problem we are seeing on two relatively
lightly used instances on CentOS 7 (and previously on CentOS 6,
which is no longer in use). Our servers have 3624 entries
according to last night's export (we export userRoot daily).
There are currently just over 400 connections established to each
server.
We have a local cron job that runs every 5 minutes that performs a
simple query. If it takes more than 7 seconds to get an answer,
the attempt is aborted and another query issued. If three
consecutive test fail, the directory server is restarted.
The issue we're seeing is that the longer the system is up, the
more often checks will fail. Restarting the directory does not
resolve the problem. Our servers have currently been up for 108
days, and are restarting the service several times a day, as a
result of the checks. Only if we reboot the systems does the
problem subside.
CPU utilization seems relatively high for such a small directory,
but it's not constant. I tried to manually capture a bit of data
with strace during a period when CPU use was bursting high.
During a capture of maybe two seconds, I saw most of the CPU time
was spent in futex. usecs/call was fairly high for calls to futex
and select, as detailed below.
Since restarting the service doesn't fix the problem, it seems
most likely that this is an OS bug, but I'm hoping that the list
can help me identify other useful data to track down the problem.
Does anyone have any suggestions for what I can capture now, while
I can sometimes observe the problem? If I reboot, it'll take
months before I can get any new data.
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
74.61 4.505251 3590 1255 340 futex
17.65 1.065548 6660 160 select
4.41 0.266344 88781 3 2 restart_syscall
3.07 0.185566 50 3718 poll
0.10 0.006185 2 3610 sendto
0.09 0.005189 5189 1 fsync
0.04 0.002134 37 58 write
0.03 0.001618 27 61 setsockopt
0.00 0.000111 3 36 recvfrom
0.00 0.000078 1 57 read
0.00 0.000014 14 1 fstat
0.00 0.000003 2 2 accept
0.00 0.000003 1 6 fcntl
0.00 0.000002 1 2 getsockname
0.00 0.000001 1 2 close
------ ----------- ----------- --------- --------- ----------------
100.00 6.038047 8972 342 total
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
<mailto:389-users@xxxxxxxxxxxxxxxxxxxxxxx>
To unsubscribe send an email to
389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
<mailto:389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx>
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx