Reinhard Nappert wrote: > Rich, which debugging level do you suggest? Apparently, I tried to much, because it would crash the server constantly. Debugging levels should not crash the server - can provide more information about the crash? > For now, I go just with 8 (Connection Management). Seeing the problem, what would you enable? > Yes, start with 8. > Thanks, > -Reinhard > > -----Original Message----- > From: 389-users-bounces@xxxxxxxxxxxxxxxxxxxxxxx [mailto:389-users-bounces@xxxxxxxxxxxxxxxxxxxxxxx] On Behalf Of Rich Megginson > Sent: Wednesday, May 12, 2010 6:50 PM > To: General discussion list for the 389 Directory server project. > Subject: Re: [389-users] Skipped request ... > > Reinhard Nappert wrote: > >> Hi Rich, >> >> I ran some further tests. This entire thing looks kind of weird. I have a kind of monitoring tool, I use to figure out if the server still responds in a timely manner. This tool performs an anonymous bind and reads a specific object, every 30 seconds. >> > Does it perform an unbind operation? Does it disconnect the socket? > >> What I see is that the server responds to the incoming request and it performs about 500 requests within those 30 seconds. Then, I see, when the next monitoring connection request comes is, but I never see the bind. Since this times out, the monitoring tool restarts the server after a while (about 10 seconds). >> >> Here are the logs in access: >> [11/May/2010:22:12:20 -0400] conn=94 fd=83 slot=83 connection from >> 127.0.0.1 to 127.0.0.1 >> [11/May/2010:22:13:24 -0400] conn=0 fd=64 slot=64 SSL connection from >> 10.227.6.45 to 10.227.6.53 >> >> So, you see the server does not respond to any requests after >> [11/May/2010:22:12:20 -0400] conn=94 fd=83 slot=83 connection from >> 127.0.0.1 to 127.0.0.1 >> >> And start responding, once it was restarted: >> [11/May/2010:22:13:24 -0400] conn=0 fd=64 slot=64 SSL connection from >> 10.227.6.45 to 10.227.6.53 >> >> I was wondering , if we could get somehow some debugging out of ns-slapd, once it is in this state (truss or something else). >> >> > http://directory.fedoraproject.org/wiki/FAQ#Troubleshooting > If that produces too much error log output, or kills the performance, you can also try replacing the error log with a named pipe+script - http://directory.fedoraproject.org/wiki/Named_Pipe_Log_Script > man ds-logpipe.py > >> Any help is appreciated. >> >> Thanks, >> -Reinhard >> >> >> -----Original Message----- >> From: 389-users-bounces@xxxxxxxxxxxxxxxxxxxxxxx [mailto:389-users-bounces@xxxxxxxxxxxxxxxxxxxxxxx] On Behalf Of Rich Megginson >> Sent: Tuesday, May 11, 2010 5:21 PM >> To: General discussion list for the 389 Directory server project. >> Subject: Re: [389-users] Skipped request ... >> >> Reinhard Nappert wrote: >> >> >>> Hi all, >>> >>> I have seen a weird behavior of my DS (1.1.2). It has a very small >>> database (only about 2300 objects). A client performed a one-level >>> search retrieving the children. The server find 114 objects, but the >>> search was very slow: >>> >>> [06/May/2010:12:23:11 +0000] conn=127 op=149 SRCH base=<base> scope=1 >>> filter="(&(&(objectClass=<xyz>)(<att1>=value))(!(<att2>=TRUE)))" >>> >>> yes, the filter is a bit complex, but both attribute types <att1> and >>> <att2> are indexed. This search usually is fast. It looks to me that >>> the server is already in a funny state. >>> ... >>> [06/May/2010:12:23:17 +0000] conn=127 op=149 RESULT err=3 tag=101 >>> nentries=114 etime=7 >>> >>> >> err=3 is TIMELIMIT_EXCEEDED - that's probably why you aren't getting all of the results you expect, and could be why it's skipping the op. >> >> >>> >>> When the client gets the results, it iterates over those and gets its >>> children, like: >>> >>> [06/May/2010:12:23:17 +0000] conn=127 op=150 SRCH base=<dn of result >>> from previous SRCH> scope=1 >>> filter="(&(&(objectClass=<uvw>)(<attr3>=*))(!(<attr2>=TRUE)))" attrs=ALL. >>> Those searches are quick: >>> [06/May/2010:12:23:17 +0000] conn=127 op=150 RESULT err=0 tag=101 >>> nentries=1 etime=0 >>> >>> but somehow the server does not process on of the requests, when the >>> client iterates over the results: >>> >>> [06/May/2010:12:23:18 +0000] conn=127 op=263 SRCH base=<dn of result >>> from previous SRCH> scope=1 >>> filter="(&(&(objectClass=<uvw>)(<attr3>=*))(!(<attr2>=TRUE)))" attrs=ALL. >>> [06/May/2010:12:23:18 +0000] conn=127 op=263 RESULT err=0 tag=101 >>> nentries=1 etime=0 >>> [06/May/2010:12:23:26 +0000] conn=127 op=265 SRCH base=<dn of result >>> from previous SRCH> scope=1 >>> filter="(&(&(objectClass=<uvw>)(<attr3>=*))(!(<attr2>=TRUE)))" attrs=ALL. >>> [06/May/2010:12:23:26 +0000] conn=127 op=265 RESULT err=0 tag=101 >>> nentries=0 etime=0 You can see that the server skipped op=264. It >>> looks to me that the request came in, but somehow the server joked up, >>> before it could log the request in access. >>> >>> Has anybody seen such a behavior before? >>> >>> Thanks, >>> -Reinhard >>> >>> >>> ---------------------------------------------------------------------- >>> -- >>> >>> -- >>> 389 users mailing list >>> 389-users@xxxxxxxxxxxxxxxxxxxxxxx >>> https://admin.fedoraproject.org/mailman/listinfo/389-users >>> >>> >> -- >> 389 users mailing list >> 389-users@xxxxxxxxxxxxxxxxxxxxxxx >> https://admin.fedoraproject.org/mailman/listinfo/389-users >> -- >> 389 users mailing list >> 389-users@xxxxxxxxxxxxxxxxxxxxxxx >> https://admin.fedoraproject.org/mailman/listinfo/389-users >> >> > > -- > 389 users mailing list > 389-users@xxxxxxxxxxxxxxxxxxxxxxx > https://admin.fedoraproject.org/mailman/listinfo/389-users > -- > 389 users mailing list > 389-users@xxxxxxxxxxxxxxxxxxxxxxx > https://admin.fedoraproject.org/mailman/listinfo/389-users > -- 389 users mailing list 389-users@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/389-users