Cannot stop/delete/shrink volume: "transport endpoint not connected"

dkopelevich at che.ufl.edu (Dmitry Kopelevich) · Sat, 01 Oct 2011 09:55:07 -0400

I cannot stop, delete, or shrink (i.e., remove bricks from) a GlusterFS 
volume. I get an "operation failed" message when I try to do it.

According to 'gluster peer status', all peers are connected. However, I 
cannot add a new peer to the pool. When I issue the 'gluster peer probe' 
command, the system does not issue any errors but the 'gluster peer 
status' indicates that the new peer is disconnected (all old ones are 
still connected).

The only clue regarding the cause of these problems that I was able to 
find are the following messages in file 
/var/log/glusterfs/etc-glusterfs-glusterd.vol.log:

[2011-09-28 20:50:12.128797] E 
[glusterd-handler.c:1137:glusterd_handle_cli_stop_volume] 0-: Unable to 
set cli op: 16
[2011-09-28 20:50:12.129604] W 
[socket.c:1494:__socket_proto_state_machine] 0-socket.management: 
reading from socket failed. Error (Transport endpoint is not connected), 
peer (127.0.0.1:1008)

I am getting similar messages in response to the 'gluster volume 
stop/delete/remove-brick' and 'gluster peer probe' commands.

It looks like the problem is caused by the "Transport endpoint is not 
connected" error for a local host but I cannot figure out what is 
causing this problem. Any suggestions would be greatly appreciated.

PS. The volume is a 16-brick distributed volume with infiniband 
transport (one brick per node). The OS is CentOS 5.5.

-- 
Dmitry Kopelevich
Associate Professor
Chemical Engineering Department
University of Florida
Gainesville, FL 32611

Phone: (352)-392-4422
Fax:     (352)-392-9513
E-mail:  dkopelevich at che.ufl.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gluster.org/pipermail/gluster-users/attachments/20111001/3c27038f/attachment.htm>