On Wed, Jul 23, 2014 at 06:23:24PM -0600, Rich Megginson wrote: > > Not sure. This is a complex area. > > Are you using IdM/FreeIPA or just plain 389? Just plain old 389 (backend for sssd) with a load balancer inbetween (which is flapping when we get erratic response times to simple bind attempts). A lightly loaded server is fine but once we get more than a frew dozen connections, the pauses become a real problem... > Some stack traces taking during the run might be helpful. Please > see http://port389.org/wiki/FAQ#Debugging_Hangs > I guess you'll have to replace "yum" with "apt" and I'm not sure how > the debuginfo packages are handled on Debian. > Ah, missed that bit of the website. Attaching a tarball of dumps from a hang this morning. I had an external process "pinging" the LDAP server doing lookups for a UID 'bazbar.$i' via loopback every few seconds. It's request at 1406210089 went unregistered by the server until 1406210106 (log timestamps below are GMT): [24/Jul/2014:13:55:06 +0000] conn=301 fd=182 slot=182 connection from ::1 to ::1 [24/Jul/2014:13:55:06 +0000] conn=301 op=0 BIND dn="" method=128 version=3 [24/Jul/2014:13:55:06 +0000] conn=301 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="" [24/Jul/2014:13:55:06 +0000] conn=301 op=1 SRCH base="ou=People,dc=n,dc=twosigma,dc=com" scope=2 filter="(&(uid=bazbar.7)(objectClass=posixAccount))" attrs=ALL [24/Jul/2014:13:55:06 +0000] conn=301 op=1 RESULT err=0 tag=101 nentries=0 etime=0 [24/Jul/2014:13:55:06 +0000] conn=301 op=2 UNBIND [24/Jul/2014:13:55:06 +0000] conn=301 op=2 fd=182 closed - U1 I've obfuscated a few of the buffers in the stack dumps to keep my security folks happy, shouldn't be any trouble though. These "hangs" always seem to be followed by a flurry of angry clients when things unjam: [24/Jul/2014:13:55:04 +0000] conn=299 op=1 ABANDON targetop=NOTFOUND msgid=1 [24/Jul/2014:13:55:06 +0000] conn=302 op=1 ABANDON targetop=0 msgid=1 nentries=0 etime=0 [24/Jul/2014:13:55:06 +0000] conn=307 op=1 ABANDON targetop=0 msgid=1 nentries=0 etime=0 [24/Jul/2014:13:55:06 +0000] conn=308 op=1 ABANDON targetop=0 msgid=1 nentries=0 etime=0 [24/Jul/2014:13:55:06 +0000] conn=309 op=1 ABANDON targetop=0 msgid=1 nentries=0 etime=0 [24/Jul/2014:13:55:06 +0000] conn=313 op=1 ABANDON targetop=0 msgid=1 nentries=0 etime=0 [24/Jul/2014:13:55:06 +0000] conn=315 op=1 ABANDON targetop=0 msgid=1 nentries=0 etime=0 [24/Jul/2014:13:55:06 +0000] conn=316 op=1 ABANDON targetop=0 msgid=1 nentries=0 etime=0 [24/Jul/2014:13:55:06 +0000] conn=317 op=1 ABANDON targetop=0 msgid=1 nentries=0 etime=0 [24/Jul/2014:13:55:06 +0000] conn=318 op=1 ABANDON targetop=0 msgid=1 nentries=0 etime=0 [24/Jul/2014:13:55:06 +0000] conn=319 op=1 ABANDON targetop=0 msgid=1 nentries=0 etime=0 [24/Jul/2014:13:55:06 +0000] conn=320 op=1 ABANDON targetop=0 msgid=1 nentries=0 etime=0 [24/Jul/2014:13:55:06 +0000] conn=321 op=1 ABANDON targetop=NOTFOUND msgid=1 [24/Jul/2014:13:55:06 +0000] conn=322 op=1 ABANDON targetop=0 msgid=1 nentries=0 etime=0 even a few TCP resets: [24/Jul/2014:13:55:09 +0000] conn=323 op=-1 fd=182 closed error 104 (Connection reset by peer) - TCP connection reset by peer. [24/Jul/2014:13:55:09 +0000] conn=326 op=-1 fd=165 closed error 104 (Connection reset by peer) - TCP connection reset by peer. [24/Jul/2014:13:55:11 +0000] conn=330 op=-1 fd=276 closed error 104 (Connection reset by peer) - TCP connection reset by peer. [24/Jul/2014:13:55:11 +0000] conn=332 op=-1 fd=190 closed error 104 (Connection reset by peer) - TCP connection reset by peer.
Attachment:
stacktraces.tar.gz
Description: Binary data
-- 389 users mailing list 389-users@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/389-users