On Thu, May 13, 2010 at 6:47 PM, Rich Megginson <rmeggins at redhat.com> wrote: > Reinhard Nappert wrote: > > How can I make ns-slapd to produce a core. > > > You first have to start ns-slapd in an environment that you have done > ulimit -c unlimited > then > kill -11 <ns-slapd pid> > > I got it in the same state again, and I did a > > gdb /opt/UMC/jdb/sbin/ns-slapd 16712 > > > > 0x00002b8e908889a2 in poll () from /lib64/tls/libc.so.6 > > (gdb) where > > #0 0x00002b8e908889a2 in poll () from /lib64/tls/libc.so.6 > > #1 0x00002b8e906b6a5f in PR_Poll () from > /opt/UMC/jdb/lib/dirsrv/libnspr4.so > > #2 0x0000000000415ae7 in slapd_daemon (ports=0x7fff1b44cf50) > > at ../ldap/servers/slapd/daemon.c:662 > > #3 0x000000000041c0b3 in main (argc=7, argv=0x7fff1b44d098) > > at ../ldap/servers/slapd/main.c:1162 > > > > Does this help? > > > Try this: > (gdb) set logging file /tmp/slapd.txt > (gdb) set logging on > (gdb) thread apply all bt > # hit return as many times as needed > (gdb) quit > > Then send me /tmp/slapd.txt > > -Reinhard > > > > -----Original Message----- > > From: 389-users-bounces at lists.fedoraproject.org [mailto: > 389-users-bounces at lists.fedoraproject.org] On Behalf Of Rich Megginson > > Sent: Thursday, May 13, 2010 6:04 PM > > To: General discussion list for the 389 Directory server project. > > Subject: Re: Skipped request ... > > > > Reinhard Nappert wrote: > > > >> Hi Rick, > >> > >> I attached access and error file with debug level 8. The server does not > respond to any requests anymore. If you kill the client, it responds > afterwards. > >> > >> Let me know, what you see. > >> > >> > > I don't see anything obvious. One thing I do know is that this code has > been improved since 1.1.2 (especially the debugging, which not very usefully > prints the file descriptor addresses in int format :P) I don't suppose you > could try to reproduce this with 1.2.5? > > > >> Thanks, > >> -Reinhard > >> > >> -----Original Message----- > >> From: 389-users-bounces at lists.fedoraproject.org > >> [mailto:389-users-bounces at lists.fedoraproject.org] On Behalf Of Rich > >> Megginson > >> Sent: Thursday, May 13, 2010 1:10 PM > >> To: General discussion list for the 389 Directory server project. > >> Subject: Re: Skipped request ... > >> > >> Reinhard Nappert wrote: > >> > >> > >>> Rich, which debugging level do you suggest? Apparently, I tried to > much, because it would crash the server constantly. > >>> > >>> > >> Debugging levels should not crash the server - can provide more > information about the crash? > >> > >> > >>> For now, I go just with 8 (Connection Management). Seeing the problem, > what would you enable? > >>> > >>> > >>> > >> Yes, start with 8. > >> > >> > >>> Thanks, > >>> -Reinhard > >>> > >>> -----Original Message----- > >>> From: 389-users-bounces at lists.fedoraproject.org > >>> [mailto:389-users-bounces at lists.fedoraproject.org] On Behalf Of Rich > >>> Megginson > >>> Sent: Wednesday, May 12, 2010 6:50 PM > >>> To: General discussion list for the 389 Directory server project. > >>> Subject: Re: Skipped request ... > >>> > >>> Reinhard Nappert wrote: > >>> > >>> > >>> > >>>> Hi Rich, > >>>> > >>>> I ran some further tests. This entire thing looks kind of weird. I > have a kind of monitoring tool, I use to figure out if the server still > responds in a timely manner. This tool performs an anonymous bind and reads > a specific object, every 30 seconds. > >>>> > >>>> > >>>> > >>> Does it perform an unbind operation? Does it disconnect the socket? > >>> > >>> > >>> > >>>> What I see is that the server responds to the incoming request and it > performs about 500 requests within those 30 seconds. Then, I see, when the > next monitoring connection request comes is, but I never see the bind. Since > this times out, the monitoring tool restarts the server after a while (about > 10 seconds). > >>>> > >>>> Here are the logs in access: > >>>> [11/May/2010:22:12:20 -0400] conn=94 fd=83 slot=83 connection from > >>>> 127.0.0.1 to 127.0.0.1 > >>>> [11/May/2010:22:13:24 -0400] conn=0 fd=64 slot=64 SSL connection > >>>> from > >>>> 10.227.6.45 to 10.227.6.53 > >>>> > >>>> So, you see the server does not respond to any requests after > >>>> [11/May/2010:22:12:20 -0400] conn=94 fd=83 slot=83 connection from > >>>> 127.0.0.1 to 127.0.0.1 > >>>> > >>>> And start responding, once it was restarted: > >>>> [11/May/2010:22:13:24 -0400] conn=0 fd=64 slot=64 SSL connection > >>>> from > >>>> 10.227.6.45 to 10.227.6.53 > >>>> > >>>> I was wondering , if we could get somehow some debugging out of > ns-slapd, once it is in this state (truss or something else). > >>>> > >>>> > >>>> > >>>> > >>> http://directory.fedoraproject.org/wiki/FAQ#Troubleshooting > >>> If that produces too much error log output, or kills the performance, > >>> you can also try replacing the error log with a named pipe+script - > >>> http://directory.fedoraproject.org/wiki/Named_Pipe_Log_Script > >>> man ds-logpipe.py > >>> > >>> > >>> > >>>> Any help is appreciated. > >>>> > >>>> Thanks, > >>>> -Reinhard > >>>> > >>>> > >>>> -----Original Message----- > >>>> From: 389-users-bounces at lists.fedoraproject.org > >>>> [mailto:389-users-bounces at lists.fedoraproject.org] On Behalf Of Rich > >>>> Megginson > >>>> Sent: Tuesday, May 11, 2010 5:21 PM > >>>> To: General discussion list for the 389 Directory server project. > >>>> Subject: Re: Skipped request ... > >>>> > >>>> Reinhard Nappert wrote: > >>>> > >>>> > >>>> > >>>> > >>>>> Hi all, > >>>>> > >>>>> I have seen a weird behavior of my DS (1.1.2). It has a very small > >>>>> database (only about 2300 objects). A client performed a one-level > >>>>> search retrieving the children. The server find 114 objects, but > >>>>> the search was very slow: > >>>>> > >>>>> [06/May/2010:12:23:11 +0000] conn=127 op=149 SRCH base=<base> > >>>>> scope=1 > filter="(&(&(objectClass=<xyz>)(<att1>=value))(!(<att2>=TRUE)))" > >>>>> > >>>>> yes, the filter is a bit complex, but both attribute types <att1> > >>>>> and <att2> are indexed. This search usually is fast. It looks to me > >>>>> that the server is already in a funny state. > >>>>> ... > >>>>> [06/May/2010:12:23:17 +0000] conn=127 op=149 RESULT err=3 tag=101 > >>>>> nentries=114 etime=7 > >>>>> > >>>>> > >>>>> > >>>>> > >>>> err=3 is TIMELIMIT_EXCEEDED - that's probably why you aren't getting > all of the results you expect, and could be why it's skipping the op. > >>>> > >>>> > >>>> > >>>> > >>>>> > >>>>> When the client gets the results, it iterates over those and gets > >>>>> its children, like: > >>>>> > >>>>> [06/May/2010:12:23:17 +0000] conn=127 op=150 SRCH base=<dn of > >>>>> result from previous SRCH> scope=1 > >>>>> filter="(&(&(objectClass=<uvw>)(<attr3>=*))(!(<attr2>=TRUE)))" > attrs=ALL. > >>>>> Those searches are quick: > >>>>> [06/May/2010:12:23:17 +0000] conn=127 op=150 RESULT err=0 tag=101 > >>>>> nentries=1 etime=0 > >>>>> > >>>>> but somehow the server does not process on of the requests, when > >>>>> the client iterates over the results: > >>>>> > >>>>> [06/May/2010:12:23:18 +0000] conn=127 op=263 SRCH base=<dn of > >>>>> result from previous SRCH> scope=1 > >>>>> filter="(&(&(objectClass=<uvw>)(<attr3>=*))(!(<attr2>=TRUE)))" > attrs=ALL. > >>>>> [06/May/2010:12:23:18 +0000] conn=127 op=263 RESULT err=0 tag=101 > >>>>> nentries=1 etime=0 > >>>>> [06/May/2010:12:23:26 +0000] conn=127 op=265 SRCH base=<dn of > >>>>> result from previous SRCH> scope=1 > >>>>> filter="(&(&(objectClass=<uvw>)(<attr3>=*))(!(<attr2>=TRUE)))" > attrs=ALL. > >>>>> [06/May/2010:12:23:26 +0000] conn=127 op=265 RESULT err=0 tag=101 > >>>>> nentries=0 etime=0 You can see that the server skipped op=264. It > >>>>> looks to me that the request came in, but somehow the server joked > >>>>> up, before it could log the request in access. > >>>>> > >>>>> Has anybody seen such a behavior before? > >>>>> > >>>>> Thanks, > >>>>> -Reinhard > >>>>> > >>>>> > >>>>> ------------------------------------------------------------------- > >>>>> - > >>>>> -- > >>>>> -- > >>>>> > >>>>> -- > >>>>> 389 users mailing list > >>>>> 389-users at lists.fedoraproject.org > >>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users > >>>>> > >>>>> > >>>>> > >>>>> > >>>> -- > >>>> 389 users mailing list > >>>> 389-users at lists.fedoraproject.org > >>>> https://admin.fedoraproject.org/mailman/listinfo/389-users > >>>> -- > >>>> 389 users mailing list > >>>> 389-users at lists.fedoraproject.org > >>>> https://admin.fedoraproject.org/mailman/listinfo/389-users > >>>> > >>>> > >>>> > >>>> > >>> -- > >>> 389 users mailing list > >>> 389-users at lists.fedoraproject.org > >>> https://admin.fedoraproject.org/mailman/listinfo/389-users > >>> -- > >>> 389 users mailing list > >>> 389-users at lists.fedoraproject.org > >>> https://admin.fedoraproject.org/mailman/listinfo/389-users > >>> > >>> > >>> > >> -- > >> 389 users mailing list > >> 389-users at lists.fedoraproject.org > >> https://admin.fedoraproject.org/mailman/listinfo/389-users > >> > >> ---------------------------------------------------------------------- > >> -- > >> > >> -- > >> 389 users mailing list > >> 389-users at lists.fedoraproject.org > >> https://admin.fedoraproject.org/mailman/listinfo/389-users > >> > > > > -- > > 389 users mailing list > > 389-users at lists.fedoraproject.org > > https://admin.fedoraproject.org/mailman/listinfo/389-users > > -- > > 389 users mailing list > > 389-users at lists.fedoraproject.org > > https://admin.fedoraproject.org/mailman/listinfo/389-users > > > > -- > 389 users mailing list > 389-users at lists.fedoraproject.org > https://admin.fedoraproject.org/mailman/listinfo/389-users > Have you tried exporting / imporing the data to check if it was corrupt in anyway? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.fedoraproject.org/pipermail/389-users/attachments/20100513/48767bb5/attachment-0001.html