W dniu 2011-11-18 19:49, Rich Megginson pisze: > On 11/18/2011 11:46 AM, Daniel Fenert wrote: >> W dniu 2011-11-18 14:42, Rich Megginson pisze: >>> On 11/18/2011 05:08 AM, Daniel Fenert wrote: >>>> Hi, >>>> >>>> I'm using 389ds 1.2.5 with replication, my current setup: >>>> >>>> Master >>>> | \ >>>> L1 L2 >>>> | \ | \ >>>> S1 S2 S3 S4 >>>> >>>> L* - acting as slave to "master" and master to "S*" >>>> S* - slaves to L* >>>> >>>> >>>> From time to time (usually few months between problems) we encounter >>>> "master" going to some infinite loop. >>>> After analyzing access log, it looks like it stops doing queries, and >>>> accepts new connections until it runs out of fd's. >>>> After that, it won't stop peacefully, only SIGKILL saves the day. >>>> >>>> Workload: >>>> Master is used only for updates, maybe 20 connections/s. >>>> L* are used only for replication. >>>> All bind's and search queries are targeted to S* which are read only. >>>> >>>> With previous setup (less complicated), we've also seen this problem: >>>> Master >>>> | | | \ >>>> S1 S2 S3 S4... >>>> >>>> Is there a chance that upgrading to latest version will fix the >>>> problem? >>>> Were there any fixes nearby? Upgrade will be complex as hell ;) >>>> >>>> Error log from last problem: >>>> - Not listening for new connections - too many fds open >>> Have you tried increasing the number of fds to 8192? >> >> Yes, but it doesn't make sense - during normal operation master uses >> no more than 50-60 fd's. > Right. I'm not suggesting this is the root cause of the problem, but > increasing the number of fds could help reduce the occurance of the > problem. When the number of fd's being used started to grow, it wasn't already running queries. I think giving him more fd's would just delay for a few minutes log message that it stopped accepting new connections :) >> >>>> - slapd shutting down - signaling operation threads >>>> - slapd shutting down - waiting for 120 threads to terminate >>> Does the server shutdown on its own, or did you shut it down >>> normally (i.e. service dirsrv stop)? >> >> We have tried to stop it using init.d scripts. > 120 threads? Did you increase nsslapd-threadnumber? > If not, then I'm very curious about what all those threads are doing. Yes, we've raised number of threads long time ago - when master was used also for queries - when we hit performance problems. Nowadays these threads just hang and do nothing - I've forgot to take the thread number down. >> >>>> ... SIGKILL ... >>>> - 389-Directory/1.2.5 B2010.012.2034 starting up >>>> - Detected Disorderly Shutdown last time Directory Server was >>>> running, >>>> recovering database. >>>> - slapd started. Listening on All Interfaces port 389 for LDAP >>>> requests >>>> >>>> Number of fds: 4096. >>> Since 1.2.5 we have fixed a number of bugs around connection >>> handling. You might find that 1.2.9.9 (current stable version) >>> works much better for you. >> >> OK, we'll try to upgrade. >> >> How to upgrade such complex setup? >> Should we try top-to-bottom approach (master first, then L*, then S*) >> or bottom-to-top (S*, L*, master last)? > bottom to top Thanks, we'll try in the next weeks. >> Shutting down all servers is not really an option. >> > -- 389 users mailing list 389-users@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/389-users