mixing rdma/tcp bricks, rebalance operation locked up

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I built an rdma-based volume out of 5 bricks:

$ gluster volume info

Volume Name: glrdma
Type: Distribute
Status: Started
Number of Bricks: 6
Transport-type: rdma
Bricks:
Brick1: pbs1:/data2
Brick2: pbs2:/data2
Brick3: pbs3:/data2
Brick4: pbs3:/data
Brick5: pbs4:/data
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on

and everything was working well.  I then tried to add a TCP/socket 
brick to it, thinking that it would be refused, but gluster happily 
added it:

$ gluster volume info

Volume Name: glrdma
Type: Distribute
Status: Started
Number of Bricks: 6
Transport-type: rdma
Bricks:
Brick1: pbs1:/data2
Brick2: pbs2:/data2
Brick3: pbs3:/data2
Brick4: pbs3:/data
Brick5: pbs4:/data
Brick6: dabrick:/data2   <-- TCP/socket brick
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on

However, not too suprisingly, there are problems when I tried to 
rebalance the added brick.  It allowed me to start a rebalance/fix-
layout, but it never ended and the logs continue to contain the 
following reports of 'connection refused' (see at bottom).

Attempts to remove the TCP brick are unsuccessful, even after stopping 
the volume:

$ gluster volume stop glrdma 
Stopping volume will make its data inaccessible. Do you want to 
continue? (y/n) y
Stopping volume glrdma has been successful

$ gluster volume remove-brick glrdma dabrick:/data2
Removing brick(s) can result in data loss. Do you want to Continue? 
(y/n) y
Remove Brick unsuccessful

(more errors citing missing 'option transport-type'. defaulting to 
"socket":

[2011-12-13 10:34:57.241676] I [cli-rpc-
ops.c:1073:gf_cli3_1_remove_brick_cbk] 0-cli: Received resp to remove 
brick
[2011-12-13 10:34:57.241852] I [input.c:46:cli_batch] 0-: Exiting 
with: -1
[2011-12-13 10:46:08.937294] W [rpc-
transport.c:606:rpc_transport_load] 0-rpc-transport: missing 'option 
transport-type'. defaulting to "socket"
[2011-12-13 10:46:09.110636] I [cli-rpc-
ops.c:417:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to get vol: 0
[2011-12-13 10:46:09.110845] I [cli-rpc-
ops.c:596:gf_cli3_1_get_volume_cbk] 0-: Returning: 0
[2011-12-13 10:46:09.111038] I [cli-rpc-
ops.c:417:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to get vol: 0
[2011-12-13 10:46:09.111070] I [cli-rpc-
ops.c:596:gf_cli3_1_get_volume_cbk] 0-: Returning: 0
[2011-12-13 10:46:09.111080] I [input.c:46:cli_batch] 0-: Exiting 
with: 0
[2011-12-13 10:52:18.142283] W [rpc-
transport.c:606:rpc_transport_load] 0-rpc-transport: missing 'option 
transport-type'. defaulting to "socket"


 And the rebalance operations now seem to be locked up, since the 
response to rebalance is nonsensical: (the commands were given 
serially, with no other intervening commands)

 $ gluster volume rebalance glrdma fix-layout start
Rebalance on glrdma is already started

$ gluster volume rebalance glrdma fix-layout status
rebalance stopped

$ gluster volume rebalance glrdma fix-layout stop
stopped rebalance process of volume glrdma 
(after rebalancing 0 files totaling 0 bytes)

$ gluster volume rebalance glrdma fix-layout start
Rebalance on glrdma is already started

Is there a way to back out of this situation? Or has incorrectly 
adding the TCP brick permanently hosed the volume? 

And does this imply a bug in the add-brick routine? (hopefully fixed?)




Logs extracts
--------------
tc-glusterd-mount-glrdma.log (and nfs.log, even tho I haven't tried to 
export it via nfs) has zillions of these lines:

[2011-12-13 10:36:11.702130] E [rdma.c:4417:tcp_connect_finish] 0-
glrdma-client-5: tcp connect to  failed (Connection refused)

cli.log has many of these lines:
[2011-12-13 10:34:55.142428] W [rpc-
transport.c:606:rpc_transport_load] 0-rpc-transport: missing 'option 
transport-type'. defaulting to "socket"


-- 
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[ZOT 2225] / 92697  Google Voice Multiplexer: (949) 478-4487 
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
--
This signature has been OCCUPIED!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gluster.org/pipermail/gluster-users/attachments/20111213/87728278/attachment-0001.htm>


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux