Skipped request ...

edlinuxguru at gmail.com (Edward Capriolo) · Thu, 13 May 2010 21:12:07 -0400

On Thu, May 13, 2010 at 6:47 PM, Rich Megginson <rmeggins at redhat.com> wrote:

> Reinhard Nappert wrote:
> > How can I make ns-slapd to produce a core.
> >
> You first have to start ns-slapd in an environment that you have done
> ulimit -c unlimited
> then
> kill -11 <ns-slapd pid>
> > I got it in the same state again, and I did a
> > gdb /opt/UMC/jdb/sbin/ns-slapd 16712
> >
> > 0x00002b8e908889a2 in poll () from /lib64/tls/libc.so.6
> > (gdb) where
> > #0  0x00002b8e908889a2 in poll () from /lib64/tls/libc.so.6
> > #1  0x00002b8e906b6a5f in PR_Poll () from
> /opt/UMC/jdb/lib/dirsrv/libnspr4.so
> > #2  0x0000000000415ae7 in slapd_daemon (ports=0x7fff1b44cf50)
> >     at ../ldap/servers/slapd/daemon.c:662
> > #3  0x000000000041c0b3 in main (argc=7, argv=0x7fff1b44d098)
> >     at ../ldap/servers/slapd/main.c:1162
> >
> > Does this help?
> >
> Try this:
> (gdb) set logging file /tmp/slapd.txt
> (gdb) set logging on
> (gdb) thread apply all bt
> # hit return as many times as needed
> (gdb) quit
>
> Then send me /tmp/slapd.txt
> > -Reinhard
> >
> > -----Original Message-----
> > From: 389-users-bounces at lists.fedoraproject.org [mailto:
> 389-users-bounces at lists.fedoraproject.org] On Behalf Of Rich Megginson
> > Sent: Thursday, May 13, 2010 6:04 PM
> > To: General discussion list for the 389 Directory server project.
> > Subject: Re: Skipped request ...
> >
> > Reinhard Nappert wrote:
> >
> >> Hi Rick,
> >>
> >> I attached access and error file with debug level 8. The server does not
> respond to any requests anymore. If you kill the client, it responds
> afterwards.
> >>
> >> Let me know, what you see.
> >>
> >>
> > I don't see anything obvious.  One thing I do know is that this code has
> been improved since 1.1.2 (especially the debugging, which not very usefully
> prints the file descriptor addresses in int format :P)  I don't suppose you
> could try to reproduce this with 1.2.5?
> >
> >> Thanks,
> >> -Reinhard
> >>
> >> -----Original Message-----
> >> From: 389-users-bounces at lists.fedoraproject.org
> >> [mailto:389-users-bounces at lists.fedoraproject.org] On Behalf Of Rich
> >> Megginson
> >> Sent: Thursday, May 13, 2010 1:10 PM
> >> To: General discussion list for the 389 Directory server project.
> >> Subject: Re: Skipped request ...
> >>
> >> Reinhard Nappert wrote:
> >>
> >>
> >>> Rich, which debugging level do you suggest? Apparently, I tried to
> much, because it would crash the server constantly.
> >>>
> >>>
> >> Debugging levels should not crash the server - can provide more
> information about the crash?
> >>
> >>
> >>> For now, I go just with 8 (Connection Management). Seeing the problem,
> what would you enable?
> >>>
> >>>
> >>>
> >> Yes, start with 8.
> >>
> >>
> >>> Thanks,
> >>> -Reinhard
> >>>
> >>> -----Original Message-----
> >>> From: 389-users-bounces at lists.fedoraproject.org
> >>> [mailto:389-users-bounces at lists.fedoraproject.org] On Behalf Of Rich
> >>> Megginson
> >>> Sent: Wednesday, May 12, 2010 6:50 PM
> >>> To: General discussion list for the 389 Directory server project.
> >>> Subject: Re: Skipped request ...
> >>>
> >>> Reinhard Nappert wrote:
> >>>
> >>>
> >>>
> >>>> Hi Rich,
> >>>>
> >>>> I ran some further tests. This entire thing looks kind of weird. I
> have a kind of monitoring tool, I use to figure out if the server still
> responds in a timely manner. This tool performs an anonymous bind and reads
> a specific object, every 30 seconds.
> >>>>
> >>>>
> >>>>
> >>> Does it perform an unbind operation?  Does it disconnect the socket?
> >>>
> >>>
> >>>
> >>>>  What I see is that the server responds to the incoming request and it
> performs about 500 requests within those 30 seconds. Then, I see, when the
> next monitoring connection request comes is, but I never see the bind. Since
> this times out, the monitoring tool restarts the server after a while (about
> 10 seconds).
> >>>>
> >>>> Here are the logs in access:
> >>>> [11/May/2010:22:12:20 -0400] conn=94 fd=83 slot=83 connection from
> >>>> 127.0.0.1 to 127.0.0.1
> >>>> [11/May/2010:22:13:24 -0400] conn=0 fd=64 slot=64 SSL connection
> >>>> from
> >>>> 10.227.6.45 to 10.227.6.53
> >>>>
> >>>> So, you see the server does not respond to any requests after
> >>>> [11/May/2010:22:12:20 -0400] conn=94 fd=83 slot=83 connection from
> >>>> 127.0.0.1 to 127.0.0.1
> >>>>
> >>>> And start responding, once it was restarted:
> >>>> [11/May/2010:22:13:24 -0400] conn=0 fd=64 slot=64 SSL connection
> >>>> from
> >>>> 10.227.6.45 to 10.227.6.53
> >>>>
> >>>> I was wondering , if we could get somehow some debugging out of
> ns-slapd, once it is in this state (truss or something else).
> >>>>
> >>>>
> >>>>
> >>>>
> >>> http://directory.fedoraproject.org/wiki/FAQ#Troubleshooting
> >>> If that produces too much error log output, or kills the performance,
> >>> you can also try replacing the error log with a named pipe+script -
> >>> http://directory.fedoraproject.org/wiki/Named_Pipe_Log_Script
> >>> man ds-logpipe.py
> >>>
> >>>
> >>>
> >>>> Any help is appreciated.
> >>>>
> >>>> Thanks,
> >>>> -Reinhard
> >>>>
> >>>>
> >>>> -----Original Message-----
> >>>> From: 389-users-bounces at lists.fedoraproject.org
> >>>> [mailto:389-users-bounces at lists.fedoraproject.org] On Behalf Of Rich
> >>>> Megginson
> >>>> Sent: Tuesday, May 11, 2010 5:21 PM
> >>>> To: General discussion list for the 389 Directory server project.
> >>>> Subject: Re: Skipped request ...
> >>>>
> >>>> Reinhard Nappert wrote:
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I have seen a weird behavior of my DS (1.1.2). It has a very small
> >>>>> database (only about 2300 objects). A client performed a one-level
> >>>>> search retrieving the children. The server find 114 objects, but
> >>>>> the search was very slow:
> >>>>>
> >>>>> [06/May/2010:12:23:11 +0000] conn=127 op=149 SRCH base=<base>
> >>>>> scope=1
> filter="(&(&(objectClass=<xyz>)(<att1>=value))(!(<att2>=TRUE)))"
> >>>>>
> >>>>> yes, the filter is a bit complex, but both attribute types <att1>
> >>>>> and <att2> are indexed. This search usually is fast. It looks to me
> >>>>> that the server is already in a funny state.
> >>>>> ...
> >>>>> [06/May/2010:12:23:17 +0000] conn=127 op=149 RESULT err=3 tag=101
> >>>>> nentries=114 etime=7
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>> err=3 is TIMELIMIT_EXCEEDED - that's probably why you aren't getting
> all of the results you expect, and could be why it's skipping the op.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>> When the client gets the results, it iterates over those and gets
> >>>>> its children, like:
> >>>>>
> >>>>> [06/May/2010:12:23:17 +0000] conn=127 op=150 SRCH base=<dn of
> >>>>> result from previous SRCH> scope=1
> >>>>> filter="(&(&(objectClass=<uvw>)(<attr3>=*))(!(<attr2>=TRUE)))"
> attrs=ALL.
> >>>>> Those searches are quick:
> >>>>> [06/May/2010:12:23:17 +0000] conn=127 op=150 RESULT err=0 tag=101
> >>>>> nentries=1 etime=0
> >>>>>
> >>>>> but somehow the server does not process on of the requests, when
> >>>>> the client iterates over the results:
> >>>>>
> >>>>> [06/May/2010:12:23:18 +0000] conn=127 op=263 SRCH base=<dn of
> >>>>> result from previous SRCH> scope=1
> >>>>> filter="(&(&(objectClass=<uvw>)(<attr3>=*))(!(<attr2>=TRUE)))"
> attrs=ALL.
> >>>>> [06/May/2010:12:23:18 +0000] conn=127 op=263 RESULT err=0 tag=101
> >>>>> nentries=1 etime=0
> >>>>> [06/May/2010:12:23:26 +0000] conn=127 op=265 SRCH base=<dn of
> >>>>> result from previous SRCH> scope=1
> >>>>> filter="(&(&(objectClass=<uvw>)(<attr3>=*))(!(<attr2>=TRUE)))"
> attrs=ALL.
> >>>>> [06/May/2010:12:23:26 +0000] conn=127 op=265 RESULT err=0 tag=101
> >>>>> nentries=0 etime=0 You can see that the server skipped op=264. It
> >>>>> looks to me that the request came in, but somehow the server joked
> >>>>> up, before it could log the request in access.
> >>>>>
> >>>>> Has anybody seen such a behavior before?
> >>>>>
> >>>>> Thanks,
> >>>>> -Reinhard
> >>>>>
> >>>>>
> >>>>> -------------------------------------------------------------------
> >>>>> -
> >>>>> --
> >>>>> --
> >>>>>
> >>>>> --
> >>>>> 389 users mailing list
> >>>>> 389-users at lists.fedoraproject.org
> >>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>> --
> >>>> 389 users mailing list
> >>>> 389-users at lists.fedoraproject.org
> >>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
> >>>> --
> >>>> 389 users mailing list
> >>>> 389-users at lists.fedoraproject.org
> >>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
> >>>>
> >>>>
> >>>>
> >>>>
> >>> --
> >>> 389 users mailing list
> >>> 389-users at lists.fedoraproject.org
> >>> https://admin.fedoraproject.org/mailman/listinfo/389-users
> >>> --
> >>> 389 users mailing list
> >>> 389-users at lists.fedoraproject.org
> >>> https://admin.fedoraproject.org/mailman/listinfo/389-users
> >>>
> >>>
> >>>
> >> --
> >> 389 users mailing list
> >> 389-users at lists.fedoraproject.org
> >> https://admin.fedoraproject.org/mailman/listinfo/389-users
> >>
> >> ----------------------------------------------------------------------
> >> --
> >>
> >> --
> >> 389 users mailing list
> >> 389-users at lists.fedoraproject.org
> >> https://admin.fedoraproject.org/mailman/listinfo/389-users
> >>
> >
> > --
> > 389 users mailing list
> > 389-users at lists.fedoraproject.org
> > https://admin.fedoraproject.org/mailman/listinfo/389-users
> > --
> > 389 users mailing list
> > 389-users at lists.fedoraproject.org
> > https://admin.fedoraproject.org/mailman/listinfo/389-users
> >
>
> --
> 389 users mailing list
> 389-users at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users
>

Have you tried exporting / imporing the data to check if it was corrupt in
anyway?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.fedoraproject.org/pipermail/389-users/attachments/20100513/48767bb5/attachment-0001.html