Re: [389-users] 389DS very slow shutdown

Diego Woitasen <diego@xxxxxxxxxxxxxxx> · Sun, 22 Jan 2012 17:55:05 -0300

On Thu, Nov 10, 2011 at 10:39 PM, Diego Woitasen <diego@xxxxxxxxxxxxxxx> wrote:
> No, I'm not running that searches. I'm sure.
>
> I forgot to mention that I have replication working between 4 servers.
> there will be 150 in the future.
>
> Is there a relation between that searches and replication?
>
>
> On Thu, Nov 10, 2011 at 8:56 PM, Noriko Hosoi <nhosoi@xxxxxxxxxx> wrote:
>> Hello,
>>
>> It looks you are running lots of psearch like this:
>> ps_service_persistent_searches: entry
>> "cn=csidn,cn=replica,cn=ou\3Dcsidn\2Cou\3DConsulados\2Cdc\3Dmrec\2Cdc\3Dar,cn=mapping
>> tree,cn=config" not enqueued on any persistent search lists
>>
>> $ egrep ps_service_persistent_searches errors | wc -l
>> 55
>>
>> I'm curious if it changes the behavior if you shutdown the server after
>> killing them?
>> --noriko
>>
>> Diego Woitasen wrote:
>>> Hi,
>>>   I have a weird problem with 389DS. It takes more than 5 minutes to
>>> shutdown. The init script sends a SIGTERM to the process and it
>>> finishes clean. That's clear looking at the log file too:
>>>
>>> grep "slapd shutting down" errors
>>> [10/Nov/2011:17:55:52 -0300] - slapd shutting down - waiting for 22
>>> threads to terminate
>>> [10/Nov/2011:17:55:52 -0300] - slapd shutting down - closing down
>>> internal subsystems and plugins
>>> [10/Nov/2011:17:55:52 -0300] - slapd shutting down - waiting for
>>> backends to close down
>>> [10/Nov/2011:18:01:41 -0300] - slapd shutting down - backends closed down
>>>
>>> First I thought that I was related to my 150 DBs but I created a test
>>> case with a clean server, 150 DBs and 10.000 entries and the shutdown
>>> takes 2 seconds.
>>>
>>> The only weird thing that I see is the dse.ldif.tmp file being
>>> truncated and written and again and again... several times until
>>> shutdown. Strace shows me that the process is writting configuration
>>> entries too.
>>>
>>> I'm using DS 1.2.9.9 (same problem with 1.2.8.3) on Debian Squeeze.
>>>
>>> I set errorlevel to 1 but I don't know is there is something
>>> interesting in the log. I upload the log here if someone want to have
>>> a look: http://main.woitasen.com.ar/errors
>>>
>>> What can I do to start to discover what's happening here?
>>>
>>> Regards,
>>>   Diego
>>>
>>
>> --
>> 389 users mailing list
>> 389-users@xxxxxxxxxxxxxxxxxxxxxxx
>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>
>
>
> --
> Diego Woitasen

I'm trying to figure out what's going on with this again. I ran
ns-slapd with strace for a few minutes:

strace -fco /tmp/trace.ldap -s 1000 /opt/dirsrv/sbin/ns-slapd -D
/etc/dirsrv/slapd-mreldc03 -i /var/run/dirsrv/slapd-mreldc03.pid -w
/var/run/dirsrv/slapd-mreldc03.startpid -d 0 > /tmp/ldap.out 2>&1

I added the -c arg to strace to count the time spent in each syscall
and the top 10 is:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 94.80 4989.357069      181240     27529       234 futex
  4.87  256.464031       17298     14826           select
  0.29   15.404732        1867      8250         1 poll
  0.03    1.594276           5    333723           fsync
  0.00    0.077862           0   5439669           write
  0.00    0.012001           5      2183           mmap
  0.00    0.008001           4      1895           getsockname
  0.00    0.005716           1      5153         2 read
  0.00    0.004300           5       910           rename
  0.00    0.002482           1      3003           sendto

94% of the time spent in futex, that's really bad I think. :)

Ideas are welcome ...

Regards,
 Diego

-- 
Diego Woitasen
--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users