Re: [389-users] Master caught in infinite loop

Daniel Fenert <daniel@xxxxxxxxx> · Fri, 18 Nov 2011 19:46:35 +0100

W dniu 2011-11-18 14:42, Rich Megginson pisze:
> On 11/18/2011 05:08 AM, Daniel Fenert wrote:
>> Hi,
>>
>> I'm using 389ds 1.2.5 with replication, my current setup:
>>
>> Master
>> |     \
>> L1     L2
>> | \    |  \
>> S1 S2 S3  S4
>>
>> L* - acting as slave to "master" and master to "S*"
>> S* - slaves to L*
>>
>>
>>  From time to time (usually few months between problems) we encounter
>> "master" going to some infinite loop.
>> After analyzing access log, it looks like it stops doing queries, and
>> accepts new connections until it runs out of fd's.
>> After that, it won't stop peacefully, only SIGKILL saves the day.
>>
>> Workload:
>> Master is used only for updates, maybe 20 connections/s.
>> L* are used only for replication.
>> All bind's and search queries are targeted to S* which are read only.
>>
>> With previous setup (less complicated), we've also seen this problem:
>> Master
>> |  |  |  \
>> S1 S2 S3  S4...
>>
>> Is there a chance that upgrading to latest version will fix the problem?
>> Were there any fixes nearby? Upgrade will be complex as hell ;)
>>
>> Error log from last problem:
>>   - Not listening for new connections - too many fds open
> Have you tried increasing the number of fds to 8192?

Yes, but it doesn't make sense - during normal operation master uses no 
more than 50-60 fd's.

>>   - slapd shutting down - signaling operation threads
>>   - slapd shutting down - waiting for 120 threads to terminate
> Does the server shutdown on its own, or did you shut it down normally 
> (i.e. service dirsrv stop)?

We have tried to stop it using init.d scripts.

>> ... SIGKILL ...
>>   - 389-Directory/1.2.5 B2010.012.2034 starting up
>>   - Detected Disorderly Shutdown last time Directory Server was running,
>> recovering database.
>>   - slapd started.  Listening on All Interfaces port 389 for LDAP 
>> requests
>>
>> Number of fds: 4096.
> Since 1.2.5 we have fixed a number of bugs around connection 
> handling.  You might find that 1.2.9.9 (current stable version) works 
> much better for you.

OK, we'll try to upgrade.

How to upgrade such complex setup?
Should we try top-to-bottom approach (master first, then L*, then S*) or 
bottom-to-top (S*, L*, master last)?
Shutting down all servers is not really an option.

-- 
Daniel Fenert
--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users