infiniband failing when too many clients connect at once

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks so much for all the help getting this working for us.
The only problem we are still seeing is when lots of clients connect at once it seems to hang the servers. Nothing is reported in the log files of the client; it literally just freezes.

For the benefit of the list,  the commands do the following:
/bu/scripts/EX runs the command supplied sequentially on each of our 6 storage servers. /etc/init.d/glustersystem mounts or unmounts a gluster mount point as a service


For a demonstration ssh to RTPST201 and run:
/bu/scripts/EX "/etc/init.d/glustersystem stop"
/bu/scripts/EX "/etc/init.d/glustersystem start"

no gluster mount will work until you do:
/bu/scripts/EX "/etc/init.d/glusterserver restart"


The servers all crash with the errors:
2008-03-19 01:58:41 C [ib-verbs-server.c:231:gf_transport_fini] ib-verbs/server: server: called fini on transport: 0x527bc0 2008-03-19 01:58:42 C [ib-verbs.c:1551:ib_verbs_tcp_notify] transport/ib-verbs: server: notify (2) called on tcp socket 2008-03-19 01:58:42 C [ib-verbs.c:1458:ib_verbs_disconnect] transport/ib-verbs: server: peer disconnected, cleaning up


Thanks again!
-Mickey Mazarick

Mickey Mazarick wrote:
Did you make any changes to the server? It's working great I just want to know if I can take credit. ;-)
I reinstalled OFED with the latest ver. (1.3)

That seems to have cleared up the weird problem with afr not failing over.

Once again, thanks for your help :-)

-Mickey Mazarick

Amar S. Tumballi wrote:
Hi Mickey,
Is it possible for you to get me a remote login, so that I can debug it sooner?

Regards,
Amar

On Tue, Mar 18, 2008 at 12:35 PM, Mickey Mazarick <mic@xxxxxxxxxxxxxxxxxx <mailto:mic@xxxxxxxxxxxxxxxxxx>> wrote:

    Yes, IB mounting works now :-)

    I ran
    find /system -type f -exec head -n 1 {} \; >/dev/null
    and the client process crashed and locked up the mount after a few
    minutes...

    The server has these errors exactly once a minute, but no errors
    in the
    client log, it just  too:
    2008-03-18 15:30:30 C [ib-verbs.c:1551:ib_verbs_tcp_notify]
    transport/ib-verbs: server: notify (2) called on tcp socket
    2008-03-18 15:30:30 C [ib-verbs.c:1458:ib_verbs_disconnect]
    transport/ib-verbs: server: peer disconnected, cleaning up
    2008-03-18 15:30:30 C [ib-verbs-server.c:231:gf_transport_fini]
    ib-verbs/server: server: called fini on transport: 0x57c680
    2008-03-18 15:30:30 C [ib-verbs.c:1551:ib_verbs_tcp_notify]
    transport/ib-verbs: server: notify (2) called on tcp socket
    2008-03-18 15:30:30 C [ib-verbs.c:1458:ib_verbs_disconnect]
    transport/ib-verbs: server: peer disconnected, cleaning up
    2008-03-18 15:30:30 C [ib-verbs-server.c:231:gf_transport_fini]
    ib-verbs/server: server: called fini on transport: 0x5a3760
    2008-03-18 15:31:29 C [ib-verbs.c:1551:ib_verbs_tcp_notify]
    transport/ib-verbs: server: notify (2) called on tcp socket
    2008-03-18 15:31:29 C [ib-verbs.c:1458:ib_verbs_disconnect]
    transport/ib-verbs: server: peer disconnected, cleaning up
    2008-03-18 15:31:29 C [ib-verbs-server.c:231:gf_transport_fini]
    ib-verbs/server: server: called fini on transport: 0x5349c0
    2008-03-18 15:31:29 C [ib-verbs.c:1551:ib_verbs_tcp_notify]
    transport/ib-verbs: server: notify (2) called on tcp socket
    2008-03-18 15:31:29 C [ib-verbs.c:1458:ib_verbs_disconnect]
    transport/ib-verbs: server: peer disconnected, cleaning up
    2008-03-18 15:31:29 C [ib-verbs-server.c:231:gf_transport_fini]
    ib-verbs/server: server: called fini on transport: 0x5c3880



    Amar S. Tumballi wrote:
    > Hi Mickey,
> With current latest (patch-708), the ib-verbs transport is working
    > fine. You can try with it.
    >
    > Regards,
    > Amar
    >
    > On Sun, Mar 16, 2008 at 7:03 PM, Amar S. Tumballi
    <amar@xxxxxxxxxxxxx <mailto:amar@xxxxxxxxxxxxx>
    > <mailto:amar@xxxxxxxxxxxxx <mailto:amar@xxxxxxxxxxxxx>>> wrote:
    >
    >     Hi Mickey,
    >     I am working on that. You can revert back to patch-700 or
    earlier
    >     till i see whats happening.
    >
    >     Regards,
    >     Amar
    >
    >
    >     On Sun, Mar 16, 2008 at 12:33 PM, Mickey Mazarick
    >     <mic@xxxxxxxxxxxxxxxxxx <mailto:mic@xxxxxxxxxxxxxxxxxx>
    <mailto:mic@xxxxxxxxxxxxxxxxxx <mailto:mic@xxxxxxxxxxxxxxxxxx>>>
    wrote:
    >
    >         on the client I get the message attempting to pipeline
    >         handshake but we
    >         never see any contents. The filesystem hangs completely
    until
    >         we unmount.
    >         I'll see if I can dig up more info/logs later.
    >
    >         -Mickey Mazarick
    >         --
    >
    >
    >         _______________________________________________
    >         Gluster-devel mailing list
    >         Gluster-devel@xxxxxxxxxx
    <mailto:Gluster-devel@xxxxxxxxxx> <mailto:Gluster-devel@xxxxxxxxxx
    <mailto:Gluster-devel@xxxxxxxxxx>>
    >         http://lists.nongnu.org/mailman/listinfo/gluster-devel
    >
    >
    >
    >
    >     --
    >     Amar Tumballi
    >     Gluster/GlusterFS Hacker
    >     [bulde on #gluster/irc.gnu.org]
    >     http://www.zresearch.com - Commoditizing Supercomputing and
    >     Superstorage!
    >
    >
    >
    >
    > --
    > Amar Tumballi
    > Gluster/GlusterFS Hacker
    > [bulde on #gluster/irc.gnu.org]
    > http://www.zresearch.com - Commoditizing Supercomputing and
    Superstorage!


    --


    _______________________________________________
    Gluster-devel mailing list
    Gluster-devel@xxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxx>
    http://lists.nongnu.org/mailman/listinfo/gluster-devel




--
Amar Tumballi
Gluster/GlusterFS Hacker
[bulde on #gluster/irc.gnu.org]
http://www.zresearch.com - Commoditizing Supercomputing and Superstorage!




--




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux