On 11/18/2011 11:46 AM, Daniel Fenert wrote: > W dniu 2011-11-18 14:42, Rich Megginson pisze: >> On 11/18/2011 05:08 AM, Daniel Fenert wrote: >>> Hi, >>> >>> I'm using 389ds 1.2.5 with replication, my current setup: >>> >>> Master >>> | \ >>> L1 L2 >>> | \ | \ >>> S1 S2 S3 S4 >>> >>> L* - acting as slave to "master" and master to "S*" >>> S* - slaves to L* >>> >>> >>> From time to time (usually few months between problems) we encounter >>> "master" going to some infinite loop. >>> After analyzing access log, it looks like it stops doing queries, and >>> accepts new connections until it runs out of fd's. >>> After that, it won't stop peacefully, only SIGKILL saves the day. >>> >>> Workload: >>> Master is used only for updates, maybe 20 connections/s. >>> L* are used only for replication. >>> All bind's and search queries are targeted to S* which are read only. >>> >>> With previous setup (less complicated), we've also seen this problem: >>> Master >>> | | | \ >>> S1 S2 S3 S4... >>> >>> Is there a chance that upgrading to latest version will fix the >>> problem? >>> Were there any fixes nearby? Upgrade will be complex as hell ;) >>> >>> Error log from last problem: >>> - Not listening for new connections - too many fds open >> Have you tried increasing the number of fds to 8192? > > Yes, but it doesn't make sense - during normal operation master uses > no more than 50-60 fd's. Right. I'm not suggesting this is the root cause of the problem, but increasing the number of fds could help reduce the occurance of the problem. > >>> - slapd shutting down - signaling operation threads >>> - slapd shutting down - waiting for 120 threads to terminate >> Does the server shutdown on its own, or did you shut it down normally >> (i.e. service dirsrv stop)? > > We have tried to stop it using init.d scripts. 120 threads? Did you increase nsslapd-threadnumber? If not, then I'm very curious about what all those threads are doing. > >>> ... SIGKILL ... >>> - 389-Directory/1.2.5 B2010.012.2034 starting up >>> - Detected Disorderly Shutdown last time Directory Server was >>> running, >>> recovering database. >>> - slapd started. Listening on All Interfaces port 389 for LDAP >>> requests >>> >>> Number of fds: 4096. >> Since 1.2.5 we have fixed a number of bugs around connection >> handling. You might find that 1.2.9.9 (current stable version) works >> much better for you. > > OK, we'll try to upgrade. > > How to upgrade such complex setup? > Should we try top-to-bottom approach (master first, then L*, then S*) > or bottom-to-top (S*, L*, master last)? bottom to top > Shutting down all servers is not really an option. > -- 389 users mailing list 389-users@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/389-users