3.1.2 feedback

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jeremy,

Did you see any out of memory (especially related to glusterfs process) logs in dmesg? It would be very helpful if we can get the logs of all client and server processes (logs got at loglevel TRACE would be very helpful). Can you please send us logs?

regards,
----- Original Message -----
> From: "Jeremy Stout" <stout.jeremy at gmail.com>
> To: Gluster-users at gluster.org
> Sent: Saturday, January 22, 2011 7:34:22 AM
> Subject: Re: 3.1.2 feedback
> I have been testing 3.1.2 over the last few days. My overall
> impression is that it resolved several bugs from 3.1.1, but the latest
> version is still prone to crashing under moderate to heavy loads.
> 
> I was running some stress tests on a two server replicated setup today
> with ~150 clients connected with RDMA. The glusterfsd process crashed
> on one server. I waited about 30 minutes to see if the automatic
> fail-over would work, but I continued to receive "Transport: endpoint
> not connected" error messages on all the clients. I saw the following
> error messages in the server log:
> (I removed several hundred error messages from the following snippet)
> [2011-01-21 15:10:13.804308] E [rpcsvc.c:1548:rpcsvc_submit_generic]
> rpc-service: failed to submit message (XID: 0x66540x, Program:
> GlusterFS-3.1.0, ProgVers: 310, Proc: 27) to rpc-transport
> (rdma.supportdir-server)
> [2011-01-21 15:10:13.804314] E [rpcsvc.c:1548:rpcsvc_submit_generic]
> rpc-service: failed to submit message (XID: 0x64658x, Program:
> GlusterFS-3.1.0, ProgVers: 310, Proc: 27) to rpc-transport
> (rdma.supportdir-server)
> [2011-01-21 15:10:13.804342] E [server.c:137:server_submit_reply] :
> Reply submission failed
> [2011-01-21 15:10:13.804365] E [server.c:137:server_submit_reply] :
> Reply submission failed
> [2011-01-21 15:10:13.804636] I [server.c:428:server_rpc_notify]
> supportdir-server: disconnected connection from 192.168.50.7:1020
> [2011-01-21 15:10:13.804702] I
> [server-helpers.c:670:server_connection_destroy] supportdir-server:
> destroyed connection of
> n7-12719-2011/01/19-17:36:59:497983-supportdir-client-0
> [2011-01-21 15:10:13.805028] I [server.c:428:server_rpc_notify]
> supportdir-server: disconnected connection from 192.168.50.127:1020
> [2011-01-21 15:10:13.805071] I
> [server-helpers.c:670:server_connection_destroy] supportdir-server:
> destroyed connection of
> n127-12567-2011/01/19-17:43:17:468018-supportdir-client-0
> 
> pending frames:
> 
> patchset: v3.1.1-64-gf2a067c
> signal received: 11
> time of crash: 2011-01-21 15:10:13
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> fdatasync 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 3.1.2
> /lib64/libc.so.6(+0x32a60)[0x7fc2a7f64a60]
> /usr/local/glusterfs/3.1.2/lib/glusterfs/3.1.2/xlator/protocol/server.so(server_release+0x54)[0x7fc2a4f05454]
> /usr/local/glusterfs/3.1.2/lib/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x26f)[0x7fc2a88d25ef]
> /usr/local/glusterfs/3.1.2/lib/libgfrpc.so.0(rpcsvc_notify+0x123)[0x7fc2a88d2c23]
> /usr/local/glusterfs/3.1.2/lib/libgfrpc.so.0(rpc_transport_notify+0x2d)[0x7fc2a88d6a9d]
> /usr/local/glusterfs/3.1.2/lib/glusterfs/3.1.2/rpc-transport/rdma.so(rdma_pollin_notify+0xd1)[0x7fc2a4ae68b1]
> /usr/local/glusterfs/3.1.2/lib/glusterfs/3.1.2/rpc-transport/rdma.so(rdma_process_recv+0x14b)[0x7fc2a4ae6e8b]
> /usr/local/glusterfs/3.1.2/lib/glusterfs/3.1.2/rpc-transport/rdma.so(+0xb226)[0x7fc2a4ae7226]
> /lib64/libpthread.so.0(+0x6a4f)[0x7fc2a8298a4f]
> /lib64/libc.so.6(clone+0x6d)[0x7fc2a800282d]
> 
> I think the crash is related to this bug:
> http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2197
> 
> I ran some smaller tests on a single server setup. The were ~50
> clients connected via RDMA. While the jobs were running, several of
> them crashed with "File descriptor in bad state" or "Stale File
> Descriptor" errors. Here are the error messages from the server log:
> [2011-01-21 10:15:52.442908] E [rpcsvc.c:1548:rpcsvc_submit_generic]
> rpc-service: failed to submit message (XID: 0x16660x, Program:
> GlusterFS-3.1.0, ProgVers: 310, Proc: 27) to rpc-transport
> (rdma.maindir-server)
> [2011-01-21 10:15:52.443012] E [rpcsvc.c:1548:rpcsvc_submit_generic]
> rpc-service: failed to submit message (XID: 0x20251x, Program:
> GlusterFS-3.1.0, ProgVers: 310, Proc: 27) to rpc-transport
> (rdma.maindir-server)
> [2011-01-21 10:15:52.442949] E [rpcsvc.c:1548:rpcsvc_submit_generic]
> rpc-service: failed to submit message (XID: 0x77360x, Program:
> GlusterFS-3.1.0, ProgVers: 310, Proc: 27) to rpc-transport
> (rdma.maindir-server)
> [2011-01-21 10:15:52.443351] E [rpcsvc.c:1548:rpcsvc_submit_generic]
> rpc-service: failed to submit message (XID: 0x26495832x, Program:
> GlusterFS-3.1.0, ProgVers: 310, Proc: 40) to rpc-transport
> (rdma.maindir-server)
> [2011-01-21 10:15:52.445247] E [rpcsvc.c:1548:rpcsvc_submit_generic]
> rpc-service: failed to submit message (XID: 0x25199x, Program:
> GlusterFS-3.1.0, ProgVers: 310, Proc: 27) to rpc-transport
> (rdma.maindir-server)
> [2011-01-21 10:15:52.445291] E [rpcsvc.c:1548:rpcsvc_submit_generic]
> rpc-service: failed to submit message (XID: 0x60907x, Program:
> GlusterFS-3.1.0, ProgVers: 310, Proc: 27) to rpc-transport
> (rdma.maindir-server)
> [2011-01-21 10:15:52.447572] I [server.c:428:server_rpc_notify]
> maindir-server: disconnected connection from 192.168.50.116:1018
> [2011-01-21 10:15:52.455116] E [server.c:137:server_submit_reply] :
> Reply submission failed
> [2011-01-21 10:15:52.455227] E [server.c:137:server_submit_reply] :
> Reply submission failed
> [2011-01-21 10:15:52.455325] E [server.c:137:server_submit_reply] :
> Reply submission failed
> [2011-01-21 10:15:52.455436] E [server.c:137:server_submit_reply] :
> Reply submission failed
> [2011-01-21 10:15:52.455896] I
> [server-helpers.c:670:server_connection_destroy] maindir-server:
> destroyed connection of
> n116-14977-2011/01/20-12:43:18:128066-maindir-client-0
> [2011-01-21 10:15:52.455610] E [server.c:137:server_submit_reply] :
> Reply submission failed
> [2011-01-21 10:15:52.455659] E [server.c:137:server_submit_reply] :
> Reply submission failed
> [2011-01-21 10:15:52.455564] E [server.c:137:server_submit_reply] :
> Reply submission failed
> [2011-01-21 10:15:52.458581] I [server.c:428:server_rpc_notify]
> maindir-server: disconnected connection from 192.168.50.19:1018
> [2011-01-21 10:15:52.458677] I
> [server-helpers.c:670:server_connection_destroy] maindir-server:
> destroyed connection of
> n19-15053-2011/01/20-12:38:13:243408-maindir-client-0
> (I removed dozens of similar error message)
> 
> The glusterfsd process did not crash in that instance.
> 
> Jeremy Stout
> 
> On Fri, Jan 21, 2011 at 6:49 AM, David Lloyd
> <david.lloyd at v-consultants.co.uk> wrote:
> > Hello,
> >
> > Haven't heard much feedback about installing glusterfs 3.1.2.
> >
> > Should I infer that it's all gone extremely very smoothly for
> > everyone, or
> > is everyone being as cowardly as me and waiting for others to do it
> > first?
> >
> > Cheers
> > David
> >
> > --
> > David Lloyd
> > V Consultants
> > www.v-consultants.co.uk
> > tel: +44 7983 816501
> > skype: davidlloyd1243
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> >
> >
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux