Re: performance degrades over time on CentOS 7

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/15/2016 12:58 PM, Marc Sauton wrote:
What is the test filter like?
Can we see a sanitized sample of the access log with the SRCH and RESULT?

If using SSL, review the output of
cat /proc/sys/kernel/random/entropy_avail

Do we have replication? (and large attribute values?)

You may want to run the "dbmon.sh" script to monitor cache usage for db cache and entry cache, try to capture a few samples of line about dbcachefree and userroot:ent (if the db with the problems is userroot), when the searches are becoming too long, like this example: INCR=1 HOST=m2.example.com <http://m2.example.com> BINDDN="cn=directory manager" BINDPW="password" VERBOSE=2 /usr/sbin/dbmon.sh

and review the ns-slapd errors and system messages log files for any unusual activity.

what is the ns-slapd memory foot print from restart to slow responses?
any "too high" disk i/o? (or "bad" ssd?)

It is also useful to get a few stacktraces which will give us detailed information about what the server is doing. For example, if you can "catch" the server while it is misbehaving, and get stacktraces every second for 10 seconds. http://www.port389.org/docs/389ds/FAQ/faq.html#debugging-hangs


Thanks,
M.

On Tue, Nov 15, 2016 at 11:40 AM, Gordon Messmer <gordon.messmer@xxxxxxxxx <mailto:gordon.messmer@xxxxxxxxx>> wrote:

    I'm trying to track down a problem we are seeing on two relatively
    lightly used instances on CentOS 7 (and previously on CentOS 6,
    which is no longer in use).  Our servers have 3624 entries
according to last night's export (we export userRoot daily). There are currently just over 400 connections established to each
    server.

    We have a local cron job that runs every 5 minutes that performs a
    simple query.  If it takes more than 7 seconds to get an answer,
    the attempt is aborted and another query issued.  If three
    consecutive test fail, the directory server is restarted.

    The issue we're seeing is that the longer the system is up, the
    more often checks will fail.  Restarting the directory does not
    resolve the problem.  Our servers have currently been up for 108
    days, and are restarting the service several times a day, as a
    result of the checks.  Only if we reboot the systems does the
    problem subside.

    CPU utilization seems relatively high for such a small directory,
    but it's not constant.  I tried to manually capture a bit of data
with strace during a period when CPU use was bursting high. During a capture of maybe two seconds, I saw most of the CPU time
    was spent in futex. usecs/call was fairly high for calls to futex
    and select, as detailed below.

    Since restarting the service doesn't fix the problem, it seems
    most likely that this is an OS bug, but I'm hoping that the list
can help me identify other useful data to track down the problem. Does anyone have any suggestions for what I can capture now, while
    I can sometimes observe the problem?  If I reboot, it'll take
    months before I can get any new data.


    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     74.61    4.505251        3590      1255       340 futex
     17.65    1.065548        6660       160           select
      4.41    0.266344       88781         3         2 restart_syscall
      3.07    0.185566          50      3718           poll
      0.10    0.006185           2      3610           sendto
      0.09    0.005189        5189         1           fsync
      0.04    0.002134          37        58           write
      0.03    0.001618          27        61  setsockopt
      0.00    0.000111           3        36           recvfrom
      0.00    0.000078           1        57           read
      0.00    0.000014          14         1           fstat
      0.00    0.000003           2         2           accept
      0.00    0.000003           1         6           fcntl
      0.00    0.000002           1         2  getsockname
      0.00    0.000001           1         2           close
    ------ ----------- ----------- --------- --------- ----------------
    100.00    6.038047                  8972       342 total
    _______________________________________________
    389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
    <mailto:389-users@xxxxxxxxxxxxxxxxxxxxxxx>
    To unsubscribe send an email to
    389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
    <mailto:389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx>




_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx

_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx




[Index of Archives]     [Fedora User Discussion]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [Fedora News]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Maintainers]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Legacy]     [Fedora Desktop]     [Fedora Fonts]     [ATA RAID]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Centos]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora QA]     [Fedora Triage]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Tux]     [Yosemite News]     [Yosemite Photos]     [Linux Apps]     [Maemo Users]     [Gnome Users]     [KDE Users]     [Fedora Tools]     [Fedora Art]     [Fedora Docs]     [Maemo Users]     [Asterisk PBX]     [Fedora Sparc]     [Fedora Universal Network Connector]     [Fedora ARM]

  Powered by Linux