3.3qa3 - rdma failing to start - how do I completely clear config?

iainbuc at gmail.com (Iain Buchanan) · Tue, 4 Jun 2013 09:59:30 +0100

Discovered I had some glusterfs processes kicking around - killing these off fixed the issue.

Iain

On 4 Jun 2013, at 08:42, Iain Buchanan <iainbuc at gmail.com> wrote:

> I've been testing GlusterFS (3.3qa3) and I've managed to get into a situation where it won't start up - I've tried removing it and reinstalling (removing package, then manually wiping the /var/lib/glusterd and /var/log/glusterfs folders, reinstalling package).  With no volumes set up I tried to set up a peer (this failed: unknown errno 107).  After restarting glusterfs-server I see a steady stream of events in the log:
> 
> First there are a load like this:
> 
> [2013-06-04 07:31:22.644256] E [rdma.c:4658:gf_rdma_event_handler] 0-rpc-transport/rdma: rdma.management: pollin re
> ceived on tcp socket (peer: 127.0.0.1:61) after handshake is complete
> [2013-06-04 07:31:22.644314] W [rdma.c:4521:gf_rdma_handshake_pollerr] (-->/usr/sbin/glusterd(main+0x35a) [0x7f5ec0
> 3dc47a] (-->/usr/lib/libglusterfs.so.0(+0x3c0b7) [0x7f5ebff760b7] (-->/usr/lib/glusterfs/3.3git/rpc-transport/rdma.
> so(+0x5140) [0x7f5ebb685140]))) 0-rpc-transport/rdma: rdma.management: peer (127.0.0.1:61) disconnected, cleaning up
> 
> Then it stabilises with these messages:
> 
> 3dc47a] (-->/usr/lib/libglusterfs.so.0(+0x3c0b7) [0x7f5ebff760b7] (-->/usr/lib/glusterfs/3.3git/rpc-transport/rdma.so(+0x5140) [0x7f5ebb685140]))) 0-rpc-transport/rdma: rdma.management: peer (127.0.0.1:52511) disconnected, cleaning up
> [2013-06-04 07:32:35.015360] E [rpcsvc.c:491:rpcsvc_handle_rpc_call] 0-glusterd: Request received from non-privileged port. Failing request
> [2013-06-04 07:32:35.015384] W [rdma.c:3216:gf_rdma_pollin_notify] 0-rpc-transport/rdma: transport_notify failed
> [2013-06-04 07:32:35.015396] W [rdma.c:3331:gf_rdma_recv_request] 0-rpc-transport/rdma: pollin notification failed
> [2013-06-04 07:32:35.015411] W [rdma.c:3411:gf_rdma_process_recv] 0-rpc-transport/rdma: receiving a request from peer (192.168.0.62:54064) failed
> [2013-06-04 07:32:35.015441] W [rdma.c:4187:gf_rdma_disconnect] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f5ebf8f8e9a] (-->/usr/lib/glusterfs/3.3git/rpc-transport/rdma.so(+0xc05e) [0x7f5ebb68c05e] (-->/usr/lib/glusterfs/3.3git/rpc-transport/rdma.so(gf_rdma_process_recv+0xef) [0x7f5ebb68ba8f]))) 0-rdma.management: disconnect called (peer:192.168.0.62:54064)
> [2013-06-04 07:32:35.015525] W [rdma.c:4521:gf_rdma_handshake_pollerr] (-->/usr/sbin/glusterd(main+0x35a) [0x7f5ec03dc47a] (-->/usr/lib/libglusterfs.so.0(+0x3c0b7) [0x7f5ebff760b7] (-->/usr/lib/glusterfs/3.3git/rpc-transport/rdma.so(+0x5140) [0x7f5ebb685140]))) 0-rpc-transport/rdma: rdma.management: peer (192.168.0.62:54064) disconnected, cleaning up
> [2013-06-04 07:32:35.049460] E [rpcsvc.c:491:rpcsvc_handle_rpc_call] 0-glusterd: Request received from non-privileged port. Failing request
> [2013-06-04 07:32:35.049479] W [rdma.c:3216:gf_rdma_pollin_notify] 0-rpc-transport/rdma: transport_notify failed
> [2013-06-04 07:32:35.049490] W [rdma.c:3331:gf_rdma_recv_request] 0-rpc-transport/rdma: pollin notification failed
> [2013-06-04 07:32:35.049500] W [rdma.c:3411:gf_rdma_process_recv] 0-rpc-transport/rdma: receiving a request from peer (192.168.0.62:54065) failed
> [2013-06-04 07:32:35.049534] W [rdma.c:4187:gf_rdma_disconnect] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f5ebf8f8e9a] (-->/usr/lib/glusterfs/3.3git/rpc-transport/rdma.so(+0xc05e) [0x7f5ebb68c05e] (-->/usr/lib/glusterfs/3.3git/rpc-transport/rdma.so(gf_rdma_process_recv+0xef) [0x7f5ebb68ba8f]))) 0-rdma.management: disconnect called (peer:192.168.0.62:54065)
> 
> (192.168.0.62 is the machine I attempted to peer it with.)
> 
> Is there something else that needs to be cleared for a reinstall?  (I've not set up a volume at this point, so I'm guessing old file attributes are not an issue).  The last thing I did before reinstalling was attempting to add a brick to a test volume over rdma with replica set to the number of bricks (hoping it would copy across, but it pretty much locked up the machine - I'll retry with less data next time).
> 
> Iain