Re: [389-users] Skipped request ...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Thu, May 13, 2010 at 6:47 PM, Rich Megginson <rmeggins@xxxxxxxxxx> wrote:
Reinhard Nappert wrote:
> How can I make ns-slapd to produce a core.
>
You first have to start ns-slapd in an environment that you have done
ulimit -c unlimited
then
kill -11 <ns-slapd pid>
> I got it in the same state again, and I did a
> gdb /opt/UMC/jdb/sbin/ns-slapd 16712
>
> 0x00002b8e908889a2 in poll () from /lib64/tls/libc.so.6
> (gdb) where
> #0  0x00002b8e908889a2 in poll () from /lib64/tls/libc.so.6
> #1  0x00002b8e906b6a5f in PR_Poll () from /opt/UMC/jdb/lib/dirsrv/libnspr4.so
> #2  0x0000000000415ae7 in slapd_daemon (ports=0x7fff1b44cf50)
>     at ../ldap/servers/slapd/daemon.c:662
> #3  0x000000000041c0b3 in main (argc=7, argv=0x7fff1b44d098)
>     at ../ldap/servers/slapd/main.c:1162
>
> Does this help?
>
Try this:
(gdb) set logging file /tmp/slapd.txt
(gdb) set logging on
(gdb) thread apply all bt
# hit return as many times as needed
(gdb) quit

Then send me /tmp/slapd.txt
> -Reinhard
>
> -----Original Message-----
> From: 389-users-bounces@xxxxxxxxxxxxxxxxxxxxxxx [mailto:389-users-bounces@xxxxxxxxxxxxxxxxxxxxxxx] On Behalf Of Rich Megginson
> Sent: Thursday, May 13, 2010 6:04 PM
> To: General discussion list for the 389 Directory server project.
> Subject: Re: [389-users] Skipped request ...
>
> Reinhard Nappert wrote:
>
>> Hi Rick,
>>
>> I attached access and error file with debug level 8. The server does not respond to any requests anymore. If you kill the client, it responds afterwards.
>>
>> Let me know, what you see.
>>
>>
> I don't see anything obvious.  One thing I do know is that this code has been improved since 1.1.2 (especially the debugging, which not very usefully prints the file descriptor addresses in int format :P)  I don't suppose you could try to reproduce this with 1.2.5?
>
>> Thanks,
>> -Reinhard
>>
>> -----Original Message-----
>> From: 389-users-bounces@xxxxxxxxxxxxxxxxxxxxxxx
>> [mailto:389-users-bounces@xxxxxxxxxxxxxxxxxxxxxxx] On Behalf Of Rich
>> Megginson
>> Sent: Thursday, May 13, 2010 1:10 PM
>> To: General discussion list for the 389 Directory server project.
>> Subject: Re: [389-users] Skipped request ...
>>
>> Reinhard Nappert wrote:
>>
>>
>>> Rich, which debugging level do you suggest? Apparently, I tried to much, because it would crash the server constantly.
>>>
>>>
>> Debugging levels should not crash the server - can provide more information about the crash?
>>
>>
>>> For now, I go just with 8 (Connection Management). Seeing the problem, what would you enable?
>>>
>>>
>>>
>> Yes, start with 8.
>>
>>
>>> Thanks,
>>> -Reinhard
>>>
>>> -----Original Message-----
>>> From: 389-users-bounces@xxxxxxxxxxxxxxxxxxxxxxx
>>> [mailto:389-users-bounces@xxxxxxxxxxxxxxxxxxxxxxx] On Behalf Of Rich
>>> Megginson
>>> Sent: Wednesday, May 12, 2010 6:50 PM
>>> To: General discussion list for the 389 Directory server project.
>>> Subject: Re: [389-users] Skipped request ...
>>>
>>> Reinhard Nappert wrote:
>>>
>>>
>>>
>>>> Hi Rich,
>>>>
>>>> I ran some further tests. This entire thing looks kind of weird. I have a kind of monitoring tool, I use to figure out if the server still responds in a timely manner. This tool performs an anonymous bind and reads a specific object, every 30 seconds.
>>>>
>>>>
>>>>
>>> Does it perform an unbind operation?  Does it disconnect the socket?
>>>
>>>
>>>
>>>>  What I see is that the server responds to the incoming request and it performs about 500 requests within those 30 seconds. Then, I see, when the next monitoring connection request comes is, but I never see the bind. Since this times out, the monitoring tool restarts the server after a while (about 10 seconds).
>>>>
>>>> Here are the logs in access:
>>>> [11/May/2010:22:12:20 -0400] conn=94 fd=83 slot=83 connection from
>>>> 127.0.0.1 to 127.0.0.1
>>>> [11/May/2010:22:13:24 -0400] conn=0 fd=64 slot=64 SSL connection
>>>> from
>>>> 10.227.6.45 to 10.227.6.53
>>>>
>>>> So, you see the server does not respond to any requests after
>>>> [11/May/2010:22:12:20 -0400] conn=94 fd=83 slot=83 connection from
>>>> 127.0.0.1 to 127.0.0.1
>>>>
>>>> And start responding, once it was restarted:
>>>> [11/May/2010:22:13:24 -0400] conn=0 fd=64 slot=64 SSL connection
>>>> from
>>>> 10.227.6.45 to 10.227.6.53
>>>>
>>>> I was wondering , if we could get somehow some debugging out of ns-slapd, once it is in this state (truss or something else).
>>>>
>>>>
>>>>
>>>>
>>> http://directory.fedoraproject.org/wiki/FAQ#Troubleshooting
>>> If that produces too much error log output, or kills the performance,
>>> you can also try replacing the error log with a named pipe+script -
>>> http://directory.fedoraproject.org/wiki/Named_Pipe_Log_Script
>>> man ds-logpipe.py
>>>
>>>
>>>
>>>> Any help is appreciated.
>>>>
>>>> Thanks,
>>>> -Reinhard
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: 389-users-bounces@xxxxxxxxxxxxxxxxxxxxxxx
>>>> [mailto:389-users-bounces@xxxxxxxxxxxxxxxxxxxxxxx] On Behalf Of Rich
>>>> Megginson
>>>> Sent: Tuesday, May 11, 2010 5:21 PM
>>>> To: General discussion list for the 389 Directory server project.
>>>> Subject: Re: [389-users] Skipped request ...
>>>>
>>>> Reinhard Nappert wrote:
>>>>
>>>>
>>>>
>>>>
>>>>> Hi all,
>>>>>
>>>>> I have seen a weird behavior of my DS (1.1.2). It has a very small
>>>>> database (only about 2300 objects). A client performed a one-level
>>>>> search retrieving the children. The server find 114 objects, but
>>>>> the search was very slow:
>>>>>
>>>>> [06/May/2010:12:23:11 +0000] conn=127 op=149 SRCH base=<base>
>>>>> scope=1 filter="(&(&(objectClass=<xyz>)(<att1>=value))(!(<att2>=TRUE)))"
>>>>>
>>>>> yes, the filter is a bit complex, but both attribute types <att1>
>>>>> and <att2> are indexed. This search usually is fast. It looks to me
>>>>> that the server is already in a funny state.
>>>>> ...
>>>>> [06/May/2010:12:23:17 +0000] conn=127 op=149 RESULT err=3 tag=101
>>>>> nentries=114 etime=7
>>>>>
>>>>>
>>>>>
>>>>>
>>>> err=3 is TIMELIMIT_EXCEEDED - that's probably why you aren't getting all of the results you expect, and could be why it's skipping the op.
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> When the client gets the results, it iterates over those and gets
>>>>> its children, like:
>>>>>
>>>>> [06/May/2010:12:23:17 +0000] conn=127 op=150 SRCH base=<dn of
>>>>> result from previous SRCH> scope=1
>>>>> filter="(&(&(objectClass=<uvw>)(<attr3>=*))(!(<attr2>=TRUE)))" attrs=ALL.
>>>>> Those searches are quick:
>>>>> [06/May/2010:12:23:17 +0000] conn=127 op=150 RESULT err=0 tag=101
>>>>> nentries=1 etime=0
>>>>>
>>>>> but somehow the server does not process on of the requests, when
>>>>> the client iterates over the results:
>>>>>
>>>>> [06/May/2010:12:23:18 +0000] conn=127 op=263 SRCH base=<dn of
>>>>> result from previous SRCH> scope=1
>>>>> filter="(&(&(objectClass=<uvw>)(<attr3>=*))(!(<attr2>=TRUE)))" attrs=ALL.
>>>>> [06/May/2010:12:23:18 +0000] conn=127 op=263 RESULT err=0 tag=101
>>>>> nentries=1 etime=0
>>>>> [06/May/2010:12:23:26 +0000] conn=127 op=265 SRCH base=<dn of
>>>>> result from previous SRCH> scope=1
>>>>> filter="(&(&(objectClass=<uvw>)(<attr3>=*))(!(<attr2>=TRUE)))" attrs=ALL.
>>>>> [06/May/2010:12:23:26 +0000] conn=127 op=265 RESULT err=0 tag=101
>>>>> nentries=0 etime=0 You can see that the server skipped op=264. It
>>>>> looks to me that the request came in, but somehow the server joked
>>>>> up, before it could log the request in access.
>>>>>
>>>>> Has anybody seen such a behavior before?
>>>>>
>>>>> Thanks,
>>>>> -Reinhard
>>>>>
>>>>>
>>>>> -------------------------------------------------------------------
>>>>> -
>>>>> --
>>>>> --
>>>>>
>>>>> --
>>>>> 389 users mailing list
>>>>> 389-users@xxxxxxxxxxxxxxxxxxxxxxx
>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>>>
>>>>>
>>>>>
>>>>>
>>>> --
>>>> 389 users mailing list
>>>> 389-users@xxxxxxxxxxxxxxxxxxxxxxx
>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>> --
>>>> 389 users mailing list
>>>> 389-users@xxxxxxxxxxxxxxxxxxxxxxx
>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>>
>>>>
>>>>
>>>>
>>> --
>>> 389 users mailing list
>>> 389-users@xxxxxxxxxxxxxxxxxxxxxxx
>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>> --
>>> 389 users mailing list
>>> 389-users@xxxxxxxxxxxxxxxxxxxxxxx
>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>
>>>
>>>
>> --
>> 389 users mailing list
>> 389-users@xxxxxxxxxxxxxxxxxxxxxxx
>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>
>> ----------------------------------------------------------------------
>> --
>>
>> --
>> 389 users mailing list
>> 389-users@xxxxxxxxxxxxxxxxxxxxxxx
>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>
>
> --
> 389 users mailing list
> 389-users@xxxxxxxxxxxxxxxxxxxxxxx
> https://admin.fedoraproject.org/mailman/listinfo/389-users
> --
> 389 users mailing list
> 389-users@xxxxxxxxxxxxxxxxxxxxxxx
> https://admin.fedoraproject.org/mailman/listinfo/389-users
>

--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users

Have you tried exporting / imporing the data to check if it was corrupt in anyway?
--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users

[Index of Archives]     [Fedora Directory Users]     [Fedora Directory Devel]     [Fedora Announce]     [Fedora Legacy Announce]     [Kernel]     [Fedora Legacy]     [Share Photos]     [Fedora Desktop]     [PAM]     [Red Hat Watch]     [Red Hat Development]     [Big List of Linux Books]     [Gimp]     [Yosemite News]

  Powered by Linux