Skipped request ...

rmeggins at redhat.com (Rich Megginson) · Thu, 13 May 2010 11:09:42 -0600



Reinhard Nappert wrote:
> Rich, which debugging level do you suggest? Apparently, I tried to much, because it would crash the server constantly.
Debugging levels should not crash the server - can provide more 
information about the crash?
> For now, I go just with 8 (Connection Management). Seeing the problem, what would you enable?
>   
Yes, start with 8.
> Thanks,
> -Reinhard 
>
> -----Original Message-----
> From: 389-users-bounces at lists.fedoraproject.org [mailto:389-users-bounces at lists.fedoraproject.org] On Behalf Of Rich Megginson
> Sent: Wednesday, May 12, 2010 6:50 PM
> To: General discussion list for the 389 Directory server project.
> Subject: Re: Skipped request ...
>
> Reinhard Nappert wrote:
>   
>> Hi Rich,
>>
>> I ran some further tests. This entire thing looks kind of weird. I have a kind of monitoring tool, I use to figure out if the server still responds in a timely manner. This tool performs an anonymous bind and reads a specific object, every 30 seconds.
>>     
> Does it perform an unbind operation?  Does it disconnect the socket?
>   
>>  What I see is that the server responds to the incoming request and it performs about 500 requests within those 30 seconds. Then, I see, when the next monitoring connection request comes is, but I never see the bind. Since this times out, the monitoring tool restarts the server after a while (about 10 seconds).
>>
>> Here are the logs in access:
>> [11/May/2010:22:12:20 -0400] conn=94 fd=83 slot=83 connection from 
>> 127.0.0.1 to 127.0.0.1
>> [11/May/2010:22:13:24 -0400] conn=0 fd=64 slot=64 SSL connection from 
>> 10.227.6.45 to 10.227.6.53
>>
>> So, you see the server does not respond to any requests after 
>> [11/May/2010:22:12:20 -0400] conn=94 fd=83 slot=83 connection from 
>> 127.0.0.1 to 127.0.0.1
>>
>> And start responding, once it was restarted:
>> [11/May/2010:22:13:24 -0400] conn=0 fd=64 slot=64 SSL connection from 
>> 10.227.6.45 to 10.227.6.53
>>
>> I was wondering , if we could get somehow some debugging out of ns-slapd, once it is in this state (truss or something else).
>>   
>>     
> http://directory.fedoraproject.org/wiki/FAQ#Troubleshooting
> If that produces too much error log output, or kills the performance, you can also try replacing the error log with a named pipe+script - http://directory.fedoraproject.org/wiki/Named_Pipe_Log_Script
> man ds-logpipe.py
>   
>> Any help is appreciated.
>>
>> Thanks,
>> -Reinhard
>>
>>
>> -----Original Message-----
>> From: 389-users-bounces at lists.fedoraproject.org [mailto:389-users-bounces at lists.fedoraproject.org] On Behalf Of Rich Megginson
>> Sent: Tuesday, May 11, 2010 5:21 PM
>> To: General discussion list for the 389 Directory server project.
>> Subject: Re: Skipped request ...
>>
>> Reinhard Nappert wrote:
>>   
>>     
>>> Hi all,
>>>  
>>> I have seen a weird behavior of my DS (1.1.2). It has a very small 
>>> database (only about 2300 objects). A client performed a one-level 
>>> search retrieving the children. The server find 114 objects, but the 
>>> search was very slow:
>>>  
>>> [06/May/2010:12:23:11 +0000] conn=127 op=149 SRCH base=<base> scope=1 
>>> filter="(&(&(objectClass=<xyz>)(<att1>=value))(!(<att2>=TRUE)))"
>>>  
>>> yes, the filter is a bit complex, but both attribute types <att1> and 
>>> <att2> are indexed. This search usually is fast. It looks to me that 
>>> the server is already in a funny state.
>>> ...
>>> [06/May/2010:12:23:17 +0000] conn=127 op=149 RESULT err=3 tag=101
>>> nentries=114 etime=7
>>>     
>>>       
>> err=3 is TIMELIMIT_EXCEEDED - that's probably why you aren't getting all of the results you expect, and could be why it's skipping the op.
>>   
>>     
>>>  
>>> When the client gets the results, it iterates over those and gets its 
>>> children, like:
>>>  
>>> [06/May/2010:12:23:17 +0000] conn=127 op=150 SRCH base=<dn of result 
>>> from previous SRCH> scope=1 
>>> filter="(&(&(objectClass=<uvw>)(<attr3>=*))(!(<attr2>=TRUE)))" attrs=ALL.
>>> Those searches are quick:
>>> [06/May/2010:12:23:17 +0000] conn=127 op=150 RESULT err=0 tag=101
>>> nentries=1 etime=0
>>>  
>>> but somehow the server does not process on of the requests, when the 
>>> client iterates over the results:
>>>  
>>> [06/May/2010:12:23:18 +0000] conn=127 op=263 SRCH base=<dn of result 
>>> from previous SRCH> scope=1 
>>> filter="(&(&(objectClass=<uvw>)(<attr3>=*))(!(<attr2>=TRUE)))" attrs=ALL.
>>> [06/May/2010:12:23:18 +0000] conn=127 op=263 RESULT err=0 tag=101
>>> nentries=1 etime=0
>>> [06/May/2010:12:23:26 +0000] conn=127 op=265 SRCH base=<dn of result 
>>> from previous SRCH> scope=1 
>>> filter="(&(&(objectClass=<uvw>)(<attr3>=*))(!(<attr2>=TRUE)))" attrs=ALL.
>>> [06/May/2010:12:23:26 +0000] conn=127 op=265 RESULT err=0 tag=101 
>>> nentries=0 etime=0 You can see that the server skipped op=264. It 
>>> looks to me that the request came in, but somehow the server joked up, 
>>> before it could log the request in access.
>>>  
>>> Has anybody seen such a behavior before?
>>>  
>>> Thanks,
>>> -Reinhard
>>>
>>>  
>>> ----------------------------------------------------------------------
>>> --
>>>
>>> --
>>> 389 users mailing list
>>> 389-users at lists.fedoraproject.org
>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>     
>>>       
>> --
>> 389 users mailing list
>> 389-users at lists.fedoraproject.org
>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>> --
>> 389 users mailing list
>> 389-users at lists.fedoraproject.org
>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>   
>>     
>
> --
> 389 users mailing list
> 389-users at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users
> --
> 389 users mailing list
> 389-users at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users
>