Richard Hesse wrote: > Not much new to report. The server hung again and the only thing in the > error log with connection tracing is this: > > [18/Feb/2008:13:14:03 +0000] - PR_Write(41818752) Netscape Portable Runtime > error -5961 (TCP connection reset by peer.) > [18/Feb/2008:13:14:03 +0000] - ber_flush failed, error 104 (Connection reset > by peer) > > Which doesn't look like much. Well, it tells me that the server was attempting to write to a socket, and got an error. -5961 is PR_CONNECT_RESET_ERROR which can occur if the system call returns either EPIPE or ECONNRESET. And error 104 is indeed ECONNRESET. /usr/include/asm-generic/errno.h:#define ECONNRESET 104 /* Connection reset by peer */ AFAICT, this can happen if the client shuts down the socket (for any number of reasons) but the server is still attempting to send data. In this case, the client will respond with a TCP RST. I'm not sure how or why this could happen. I'm open to other causes for ECONNRESET. What would be really, really interesting is if we could narrow this down to a particular client application and run ethereal on the connection. Are you using SSL? > As for network tuning, it's already been done. > > Max descriptors is set to 32768. > > Are there any gdb commands I can run while the server is in a hung state? > Sure. For whatever the cause of the ECONNRESET, it should not cause the server to hang, and it would be interesting to find out what it's doing. You'll have to install the fedora-ds-base-debuginfo package. Attach to the process - gdb /usr/sbin/ns-slapd <pid of process> Then, dump the thread stacks - (gdb) thread apply all bt If you want the output to go to a file, redirect gdb logging to a file first before doing the thread apply e.g. (gdb) set logging on (gdb) set logging file stack.txt > I'm going to try running strace while the process is working, and hope for a > hang. Maybe that will give us some more info. > > -richard > > On 2/19/08 10:23 AM, "Rich Megginson" <rmeggins at redhat.com> wrote: > > >> Richard Hesse wrote: >> >>> Yes, every host (except the ldap hosts) runs nscd. The ldap servers are not >>> configured to use directory data for anything. >>> >>> >> I just don't know. I've not seen this before. I suppose you could try >> checking your kernel TCP/IP settings, and increasing the number of file >> descriptors used - >> http://directory.fedoraproject.org/wiki/Performance_Tuning >> >>> -richard >>> >>> >>> On 2/15/08 2:11 PM, "Rich Megginson" <rmeggins at redhat.com> wrote: >>> >>> >>> >>>> Richard Hesse wrote: >>>> >>>> >>>>> nsswitch posix users/groups, >>>>> >>>>> >>>> Are you using nscd? >>>> >>>> >>>>> ssh, sudo, puppet (config management), and >>>>> internally written applications. >>>>> >>>>> -richard >>>>> >>>>> On 2/15/08 12:53 PM, "Rich Megginson" <rmeggins at redhat.com> wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> What is the application which is generating this load? >>>>>> >>>>>> >>>>>> >>>>> -- >>>>> Fedora-directory-users mailing list >>>>> Fedora-directory-users at redhat.com >>>>> https://www.redhat.com/mailman/listinfo/fedora-directory-users >>>>> >>>>> >>>>> >>> -- >>> Fedora-directory-users mailing list >>> Fedora-directory-users at redhat.com >>> https://www.redhat.com/mailman/listinfo/fedora-directory-users >>> >>> >> > > > -- > Fedora-directory-users mailing list > Fedora-directory-users at redhat.com > https://www.redhat.com/mailman/listinfo/fedora-directory-users > -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3245 bytes Desc: S/MIME Cryptographic Signature Url : http://lists.fedoraproject.org/pipermail/389-users/attachments/20080219/5cb30819/attachment.bin