On 10/13/2011 12:30 PM, Justin Gronfur wrote: > Well, I figured it out! > > After removing the Java stack completely from the equation, I noticed > that I was getting worse, but more consistent performance. The worse > performance indicated something to do with caching, so after I dug > into the Java code for a while I found that we were caching certain > object classes (therefore making them not need to be rebuilt from > ldap). The timeout on that cache was 5 minutes. > > So during load testing, the tests run great until everything in that > cache expires at once making all 15 threads hit 389 to rebuild the > cache entries at the exact same time. My data sample used for testing > involved about 150 objects being cached, which takes 150 queries to > generate. So when the cache expires, 15*150=2250 queries hit 389 at > the exact same time which causes it to seemingly lock up for 20 > seconds while it handles all of that. > > I am still surprised that 389 can't handle that better, but I haven't > spent much time tweaking settings or indexes for performance. I'm > sure that we are running very sub-optimally right now. The stack traces you provided give us some clues. I would suggest lowering nsslapd-threadnumber in cn=config to see if that makes a difference - one factor could be excessive thread contention based on your stack traces. You can use the logconv.pl script to analyze your access log to see if you need to tune your indexes or other performance related settings such as nsslapd-lookthroughlimit or nsslapd-idlistscanlimit. > > Thanks, > Justin -- 389 users mailing list 389-users@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/389-users