I'm trying to track down a problem we are seeing on two relatively
lightly used instances on CentOS 7 (and previously on CentOS 6, which is
no longer in use). Our servers have 3624 entries according to last
night's export (we export userRoot daily). There are currently just
over 400 connections established to each server.
We have a local cron job that runs every 5 minutes that performs a
simple query. If it takes more than 7 seconds to get an answer, the
attempt is aborted and another query issued. If three consecutive test
fail, the directory server is restarted.
The issue we're seeing is that the longer the system is up, the more
often checks will fail. Restarting the directory does not resolve the
problem. Our servers have currently been up for 108 days, and are
restarting the service several times a day, as a result of the checks.
Only if we reboot the systems does the problem subside.
CPU utilization seems relatively high for such a small directory, but
it's not constant. I tried to manually capture a bit of data with
strace during a period when CPU use was bursting high. During a capture
of maybe two seconds, I saw most of the CPU time was spent in futex.
usecs/call was fairly high for calls to futex and select, as detailed below.
Since restarting the service doesn't fix the problem, it seems most
likely that this is an OS bug, but I'm hoping that the list can help me
identify other useful data to track down the problem. Does anyone have
any suggestions for what I can capture now, while I can sometimes
observe the problem? If I reboot, it'll take months before I can get
any new data.
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
74.61 4.505251 3590 1255 340 futex
17.65 1.065548 6660 160 select
4.41 0.266344 88781 3 2 restart_syscall
3.07 0.185566 50 3718 poll
0.10 0.006185 2 3610 sendto
0.09 0.005189 5189 1 fsync
0.04 0.002134 37 58 write
0.03 0.001618 27 61 setsockopt
0.00 0.000111 3 36 recvfrom
0.00 0.000078 1 57 read
0.00 0.000014 14 1 fstat
0.00 0.000003 2 2 accept
0.00 0.000003 1 6 fcntl
0.00 0.000002 1 2 getsockname
0.00 0.000001 1 2 close
------ ----------- ----------- --------- --------- ----------------
100.00 6.038047 8972 342 total
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx