Richard Hesse wrote: > Yeah, we?re using SSL and TLS so ethereal/tcpdump isn?t going to yield much > info. It would give us the TCP/IP protocol data, so we could see what clients and servers are sending the FIN and RST. It's not so much the LDAP data I care about, although ssltap might be useful for that. > The process hung again and strace didn?t provide too much information > other than this: > > futex(0x20b9260, FUTEX_WAIT, 2, NULL) > > Would that give you a place to start looking? > That does suggest a possible deadlock. > -richard > > > On 2/19/08 4:04 PM, "Rich Megginson" <rmeggins at redhat.com> wrote: > > >> Richard Hesse wrote: >> >>> Not much new to report. The server hung again and the only thing in the >>> error log with connection tracing is this: >>> >>> [18/Feb/2008:13:14:03 +0000] - PR_Write(41818752) Netscape Portable Runtime >>> error -5961 (TCP connection reset by peer.) >>> [18/Feb/2008:13:14:03 +0000] - ber_flush failed, error 104 (Connection reset >>> by peer) >>> >>> Which doesn't look like much. >>> >> Well, it tells me that the server was attempting to write to a socket, >> and got an error. -5961 is PR_CONNECT_RESET_ERROR which can occur if >> the system call returns either EPIPE or ECONNRESET. And error 104 is >> indeed ECONNRESET. >> /usr/include/asm-generic/errno.h:#define ECONNRESET 104 >> /* Connection reset by peer */ >> >> AFAICT, this can happen if the client shuts down the socket (for any >> number of reasons) but the server is still attempting to send data. In >> this case, the client will respond with a TCP RST. I'm not sure how or >> why this could happen. I'm open to other causes for ECONNRESET. >> What would be really, really interesting is if we could narrow this down >> to a particular client application and run ethereal on the connection. >> >> Are you using SSL? >> >>> As for network tuning, it's already been done. >>> >>> Max descriptors is set to 32768. >>> >>> Are there any gdb commands I can run while the server is in a hung state? >>> >>> >> Sure. For whatever the cause of the ECONNRESET, it should not cause the >> server to hang, and it would be interesting to find out what it's >> doing. You'll have to install the fedora-ds-base-debuginfo package. >> Attach to the process - gdb /usr/sbin/ns-slapd <pid of process> >> Then, dump the thread stacks - >> >> (gdb) thread apply all bt >> >> If you want the output to go to a file, redirect gdb logging to a file >> first before doing the thread apply e.g. >> >> (gdb) set logging on >> (gdb) set logging file stack.txt >> >> >> >>> I'm going to try running strace while the process is working, and hope for a >>> hang. Maybe that will give us some more info. >>> >>> -richard >>> >>> On 2/19/08 10:23 AM, "Rich Megginson" <rmeggins at redhat.com> wrote: >>> >>> >>> >>>> Richard Hesse wrote: >>>> >>>> >>>>> Yes, every host (except the ldap hosts) runs nscd. The ldap servers are not >>>>> configured to use directory data for anything. >>>>> >>>>> >>>>> >>>> I just don't know. I've not seen this before. I suppose you could try >>>> checking your kernel TCP/IP settings, and increasing the number of file >>>> descriptors used - >>>> http://directory.fedoraproject.org/wiki/Performance_Tuning >>>> >>>> >>>>> -richard >>>>> >>>>> >>>>> On 2/15/08 2:11 PM, "Rich Megginson" <rmeggins at redhat.com> wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> Richard Hesse wrote: >>>>>> >>>>>> >>>>>> >>>>>>> nsswitch posix users/groups, >>>>>>> >>>>>>> >>>>>>> >>>>>> Are you using nscd? >>>>>> >>>>>> >>>>>> >>>>>>> ssh, sudo, puppet (config management), and >>>>>>> internally written applications. >>>>>>> >>>>>>> -richard >>>>>>> >>>>>>> On 2/15/08 12:53 PM, "Rich Megginson" <rmeggins at redhat.com> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> What is the application which is generating this load? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> Fedora-directory-users mailing list >>>>>>> Fedora-directory-users at redhat.com >>>>>>> https://www.redhat.com/mailman/listinfo/fedora-directory-users >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> -- >>>>> Fedora-directory-users mailing list >>>>> Fedora-directory-users at redhat.com >>>>> https://www.redhat.com/mailman/listinfo/fedora-directory-users >>>>> >>>>> >>>>> >>> -- >>> Fedora-directory-users mailing list >>> Fedora-directory-users at redhat.com >>> https://www.redhat.com/mailman/listinfo/fedora-directory-users >>> >>> >> > > > -- > Fedora-directory-users mailing list > Fedora-directory-users at redhat.com > https://www.redhat.com/mailman/listinfo/fedora-directory-users > -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3245 bytes Desc: S/MIME Cryptographic Signature Url : http://lists.fedoraproject.org/pipermail/389-users/attachments/20080220/8d7fedc7/attachment.bin