FDS 1.1 Transport endpoint is not connected

richard at powerset.com (Richard Hesse) · Wed, 20 Feb 2008 15:17:37 -0800

Yeah, we?re using SSL and TLS so ethereal/tcpdump isn?t going to yield much
info. The process hung again and strace didn?t provide too much information
other than this:

futex(0x20b9260, FUTEX_WAIT, 2, NULL)

Would that give you a place to start looking?

-richard

On 2/19/08 4:04 PM, "Rich Megginson" <rmeggins at redhat.com> wrote:

> Richard Hesse wrote:
>> Not much new to report. The server hung again and the only thing in the
>> error log with connection tracing is this:
>>
>> [18/Feb/2008:13:14:03 +0000] - PR_Write(41818752) Netscape Portable Runtime
>> error -5961 (TCP connection reset by peer.)
>> [18/Feb/2008:13:14:03 +0000] - ber_flush failed, error 104 (Connection reset
>> by peer)
>>
>> Which doesn't look like much.
> Well, it tells me that the server was attempting to write to a socket,
> and got an error.  -5961 is PR_CONNECT_RESET_ERROR which can occur if
> the system call returns either EPIPE or ECONNRESET.  And error 104 is
> indeed ECONNRESET.
> /usr/include/asm-generic/errno.h:#define        ECONNRESET      104
> /* Connection reset by peer */
>
> AFAICT, this can happen if the client shuts down the socket (for any
> number of reasons) but the server is still attempting to send data.  In
> this case, the client will respond with a TCP RST.  I'm not sure how or
> why this could happen.  I'm open to other causes for ECONNRESET.
> What would be really, really interesting is if we could narrow this down
> to a particular client application and run ethereal on the connection.
>
> Are you using SSL?
>> As for network tuning, it's already been done.
>>
>> Max descriptors is set to 32768.
>>
>> Are there any gdb commands I can run while the server is in a hung state?
>>
> Sure.  For whatever the cause of the ECONNRESET, it should not cause the
> server to hang, and it would be interesting to find out what it's
> doing.  You'll have to install the fedora-ds-base-debuginfo package.
> Attach to the process - gdb /usr/sbin/ns-slapd <pid of process>
> Then, dump the thread stacks -
>
> (gdb) thread apply all bt
>
> If you want the output to go to a file, redirect gdb logging to a file
> first before doing the thread apply e.g.
>
> (gdb) set logging on
> (gdb) set logging file stack.txt
>
>
>> I'm going to try running strace while the process is working, and hope for a
>> hang. Maybe that will give us some more info.
>>
>> -richard
>>
>> On 2/19/08 10:23 AM, "Rich Megginson" <rmeggins at redhat.com> wrote:
>>
>>
>>> Richard Hesse wrote:
>>>
>>>> Yes, every host (except the ldap hosts) runs nscd. The ldap servers are not
>>>> configured to use directory data for anything.
>>>>
>>>>
>>> I just don't know.  I've not seen this before.  I suppose you could try
>>> checking your kernel TCP/IP settings, and increasing the number of file
>>> descriptors used -
>>> http://directory.fedoraproject.org/wiki/Performance_Tuning
>>>
>>>> -richard
>>>>
>>>>
>>>> On 2/15/08 2:11 PM, "Rich Megginson" <rmeggins at redhat.com> wrote:
>>>>
>>>>
>>>>
>>>>> Richard Hesse wrote:
>>>>>
>>>>>
>>>>>> nsswitch posix users/groups,
>>>>>>
>>>>>>
>>>>> Are you using nscd?
>>>>>
>>>>>
>>>>>> ssh, sudo, puppet (config management), and
>>>>>> internally written applications.
>>>>>>
>>>>>> -richard
>>>>>>
>>>>>> On 2/15/08 12:53 PM, "Rich Megginson" <rmeggins at redhat.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> What is the application which is generating this load?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Fedora-directory-users mailing list
>>>>>> Fedora-directory-users at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>>>>>
>>>>>>
>>>>>>
>>>> --
>>>> Fedora-directory-users mailing list
>>>> Fedora-directory-users at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>>>
>>>>
>>>
>>
>>
>> --
>> Fedora-directory-users mailing list
>> Fedora-directory-users at redhat.com
>> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>
>
>