Richard Hesse wrote:
Yeah, we¹re using SSL and TLS so ethereal/tcpdump isn¹t going to yield much info. The process hung again and strace didn¹t provide too much information other than this: futex(0x20b9260, FUTEX_WAIT, 2, NULL) Would that give you a place to start looking?
Try logconv.pl -V /var/log/dirsrv/slapd-instancename/access
-richard On 2/19/08 4:04 PM, "Rich Megginson" <rmeggins@xxxxxxxxxx> wrote:Richard Hesse wrote:Not much new to report. The server hung again and the only thing in the error log with connection tracing is this: [18/Feb/2008:13:14:03 +0000] - PR_Write(41818752) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [18/Feb/2008:13:14:03 +0000] - ber_flush failed, error 104 (Connection reset by peer) Which doesn't look like much.Well, it tells me that the server was attempting to write to a socket, and got an error. -5961 is PR_CONNECT_RESET_ERROR which can occur if the system call returns either EPIPE or ECONNRESET. And error 104 is indeed ECONNRESET. /usr/include/asm-generic/errno.h:#define ECONNRESET 104 /* Connection reset by peer */ AFAICT, this can happen if the client shuts down the socket (for any number of reasons) but the server is still attempting to send data. In this case, the client will respond with a TCP RST. I'm not sure how or why this could happen. I'm open to other causes for ECONNRESET. What would be really, really interesting is if we could narrow this down to a particular client application and run ethereal on the connection. Are you using SSL?As for network tuning, it's already been done. Max descriptors is set to 32768. Are there any gdb commands I can run while the server is in a hung state?Sure. For whatever the cause of the ECONNRESET, it should not cause the server to hang, and it would be interesting to find out what it's doing. You'll have to install the fedora-ds-base-debuginfo package. Attach to the process - gdb /usr/sbin/ns-slapd <pid of process> Then, dump the thread stacks - (gdb) thread apply all bt If you want the output to go to a file, redirect gdb logging to a file first before doing the thread apply e.g. (gdb) set logging on (gdb) set logging file stack.txtI'm going to try running strace while the process is working, and hope for a hang. Maybe that will give us some more info. -richard On 2/19/08 10:23 AM, "Rich Megginson" <rmeggins@xxxxxxxxxx> wrote:Richard Hesse wrote:Yes, every host (except the ldap hosts) runs nscd. The ldap servers are not configured to use directory data for anything.I just don't know. I've not seen this before. I suppose you could try checking your kernel TCP/IP settings, and increasing the number of file descriptors used - http://directory.fedoraproject.org/wiki/Performance_Tuning-richard On 2/15/08 2:11 PM, "Rich Megginson" <rmeggins@xxxxxxxxxx> wrote:Richard Hesse wrote:nsswitch posix users/groups,Are you using nscd?ssh, sudo, puppet (config management), and internally written applications. -richard On 2/15/08 12:53 PM, "Rich Megginson" <rmeggins@xxxxxxxxxx> wrote:What is the application which is generating this load?-- Fedora-directory-users mailing list Fedora-directory-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-directory-users-- Fedora-directory-users mailing list Fedora-directory-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-directory-users-- Fedora-directory-users mailing list Fedora-directory-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-directory-users-- Fedora-directory-users mailing list Fedora-directory-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-directory-users
<<attachment: smime.p7s>>
-- Fedora-directory-users mailing list Fedora-directory-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-directory-users