Hello everyone,
we are experiencing the following problem in our hpc cluster, with a
gluster filesystem built using unify over afr on several couples of nodes:
Every once in a while, one of our nodes freezes; it will reply to
'ping', but it will not allow ssh connections, nor direct terminal
access. Under those circumstances, the gluster filesystem will crash,
while if the frozen node is shut down, the filesystem will work (in
deprecated mode). Since most of our running jobs use glusterfs to write
in, this is becoming a quite serious problem.
Our server spec files look like:
-------------------------------------------------------------------
### GlusterFS Server Volume Specification
### Export volume brick.
volume brick
type storage/posix
option directory /state/partition1/glfsdir/
end-volume
### Add network serving capability to brick.
volume server
type protocol/server
option transport-type tcp/server
option listen-port 6996
subvolumes brick
option auth.ip.brick.allow 10.*.*.*
end-volume
-------------------------------------------------------------------
and our client spec files look like:
-------------------------------------------------------------------
### GlusterFS Client Volume Specification
### Add client feature and attach to remote subvolume of server
volume brick0-0
type protocol/client
option transport-type tcp/client
option remote-host compute-0-0
option remote-subvolume brick
end-volume
[... several of those, up to compute 7-5 ...]
volume brick7-5
type protocol/client
option transport-type tcp/client
option remote-host compute-7-5
option remote-subvolume brick
end-volume
### Namespace brick
volume local-ns
type protocol/client
option transport-type tcp/client
option remote-host vulcano
option remote-subvolume brick-ns
end-volume
### Automatic File Replication
volume afr1
type cluster/afr
subvolumes brick0-0 brick4-0
end-volume
[... several of those, up to afr24 ...]
volume afr24
type cluster/afr
subvolumes brick3-5 brick7-5
end-volume
### Unify
volume unify
type cluster/unify
subvolumes afr1 afr2 afr3 afr4 afr5 afr6 afr7 afr8 afr9 afr10 afr11
afr12 afr13 afr14 afr15 afr16 afr17 afr18 afr19 afr20 afr21 afr22 afr23
afr24
option namespace local-ns
# ALU scheduler
option scheduler alu # use the ALU scheduler
option alu.limits.min-free-disk 5% # Don't create files on a
volume with less than 5% free diskspace
## When deciding where to place a file, first look at the write-usage,
then at
## read-usage, disk-usage, open files, and finally the disk-speed-usage.
option alu.order
write-usage:read-usage:disk-usage:open-files-usage:disk-speed-usage
option alu.write-usage.entry-threshold 20% # Kick in when the
write-usage discrepancy is 20%
option alu.write-usage.exit-threshold 15% # Don't stop until the
discrepancy has been reduced to 5%
option alu.read-usage.entry-threshold 20% # Kick in when the
read-usage discrepancy is 20%
option alu.read-usage.exit-threshold 4% # Don't stop until the
discrepancy has been reduced to 16% (20% - 4%)
option alu.disk-usage.entry-threshold 10GB # Kick in if the
discrep. in disk-usage between volumes is more than 10GB
option alu.disk-usage.exit-threshold 1GB # Don't stop writing to
the least-used volume until the discrep. is 9GB
option alu.open-files-usage.entry-threshold 1024 # Kick in if the
discrepancy in open files is 1024
option alu.open-files-usage.exit-threshold 32 # Stop when 992
files have been written in the least-used vol.
# option alu.disk-speed-usage.entry-threshold # NEVER SET IT. SPEED
IS CONSTANT!!!
# option alu.disk-speed-usage.exit-threshold # NEVER SET IT. SPEED
IS CONSTANT!!!
option alu.stat-refresh.interval 10sec # Refresh the statistics
used for decision-making every 10 seconds
# option alu.stat-refresh.num-file-create 10 # Refresh the
statistics used for decision-making after creating 10 files
## NUFA scheduler
# option scheduler nufa
# option nufa.local-volume-name afr24
end-volume
-------------------------------------------------------------------
The namespace is provided by the frontend 'vulcano' which does not
otherwise contribute to the filesystem. The scheduler is NUFA for the
nodes and ALU for the frontend. We have lately added 'option
self-heal on' to the afr bricks and 'option transport-timeout 10' to the
basic node bricks, 'brickX-X', but that had no effect on our problem.
What we get in /var/log/glusterfs/glusterfs.log is always something like
this (with two node freeze examples):
-------------------------------------------------------------------
2008-06-06 20:54:41 W [client-protocol.c:204:call_bail] brick6-0:
activating bail-out. pending frames = 4. last sent = 2008-06-06
20:51:51. last received = 2008-06-06 20:43:08 transport-timeout = 108
2008-06-06 20:54:41 C [client-protocol.c:211:call_bail] brick6-0:
bailing transport
2008-06-06 20:54:41 W [client-protocol.c:4759:client_protocol_cleanup]
brick6-0: cleaning up state in transport object 0x554c10
2008-06-06 20:54:41 E [client-protocol.c:4809:client_protocol_cleanup]
brick6-0: forced unwinding frame type(1) op(34) reply=@0x2a966a6880
2008-06-06 20:54:41 E [client-protocol.c:4405:client_lookup_cbk]
brick6-0: no proper reply from server, returning ENOTCONN
2008-06-06 20:54:41 E [client-protocol.c:4809:client_protocol_cleanup]
brick6-0: forced unwinding frame type(1) op(15) reply=@0x2a966a6880
2008-06-06 20:54:41 E [client-protocol.c:3866:client_statfs_cbk]
brick6-0: no proper reply from server, returning ENOTCONN
2008-06-06 20:54:41 E [client-protocol.c:4809:client_protocol_cleanup]
brick6-0: forced unwinding frame type(1) op(34) reply=@0x2a966a6880
2008-06-06 20:54:41 E [client-protocol.c:4405:client_lookup_cbk]
brick6-0: no proper reply from server, returning ENOTCONN
2008-06-06 20:54:41 E [client-protocol.c:4809:client_protocol_cleanup]
brick6-0: forced unwinding frame type(1) op(15) reply=@0x2a966a6880
2008-06-06 20:54:41 E [client-protocol.c:3866:client_statfs_cbk]
brick6-0: no proper reply from server, returning ENOTCONN
2008-06-06 21:00:10 W [client-protocol.c:279:client_protocol_xfer]
brick6-0: attempting to pipeline request type(1) op(35) with handshake
2008-06-06 21:00:10 W [client-protocol.c:4759:client_protocol_cleanup]
brick6-0: cleaning up state in transport object 0x554c10
2008-06-06 21:00:10 E [client-protocol.c:4809:client_protocol_cleanup]
brick6-0: forced unwinding frame type(1) op(35) reply=@0x2a9622d710
2008-06-06 21:00:10 E [tcp-client.c:190:tcp_connect] brick6-0:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:00:10 W [client-protocol.c:331:client_protocol_xfer]
brick6-0: not connected at the moment to submit frame type(1) op(35)
2008-06-06 21:00:57 E [tcp-client.c:190:tcp_connect] brick6-0:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:02:32 E [tcp-client.c:190:tcp_connect] brick6-0:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:02:32 W [client-protocol.c:331:client_protocol_xfer]
brick6-0: not connected at the moment to submit frame type(1) op(15)
2008-06-06 21:02:32 E [client-protocol.c:3866:client_statfs_cbk]
brick6-0: no proper reply from server, returning ENOTCONN
2008-06-06 21:04:24 E [protocol.c:271:gf_block_unserialize_transport]
local-ns: EOF from peer (10.1.1.1:6996)
2008-06-06 21:04:24 W [client-protocol.c:4759:client_protocol_cleanup]
local-ns: cleaning up state in transport object 0x58b580
2008-06-06 21:10:24 E [protocol.c:271:gf_block_unserialize_transport]
local-ns: EOF from peer (10.1.1.1:6996)
2008-06-06 21:10:24 W [client-protocol.c:4759:client_protocol_cleanup]
local-ns: cleaning up state in transport object 0x58e980
2008-06-09 11:11:26 W [client-protocol.c:204:call_bail] brick6-5:
activating bail-out. pending frames = 1. last sent = 2008-06-09
11:11:06. last received = 2008-06-09 04:02:03 transport-timeout = 10
2008-06-09 11:11:26 C [client-protocol.c:211:call_bail] brick6-5:
bailing transport
2008-06-09 11:11:26 W [client-protocol.c:4759:client_protocol_cleanup]
brick6-5: cleaning up state in transport object 0x56eef0
2008-06-09 11:11:26 E [client-protocol.c:4809:client_protocol_cleanup]
brick6-5: forced unwinding frame type(1) op(15) reply=@0x2a96205fd0
2008-06-09 11:11:26 E [client-protocol.c:3866:client_statfs_cbk]
brick6-5: no proper reply from server, returning ENOTCONN
2008-06-09 13:32:52 W [client-protocol.c:279:client_protocol_xfer]
brick6-5: attempting to pipeline request type(1) op(15) with handshake
2008-06-09 13:33:09 W [client-protocol.c:204:call_bail] brick6-5:
activating bail-out. pending frames = 1. last sent = 2008-06-09
13:32:52. last received = 1970-01-01 01:00:00 transport-timeout = 10
2008-06-09 13:33:09 C [client-protocol.c:211:call_bail] brick6-5:
bailing transport
2008-06-09 13:33:09 W [client-protocol.c:4759:client_protocol_cleanup]
brick6-5: cleaning up state in transport object 0x56eef0
2008-06-09 13:33:09 E [client-protocol.c:4809:client_protocol_cleanup]
brick6-5: forced unwinding frame type(1) op(15) reply=@0x2a962009e0
2008-06-09 13:33:09 E [fuse-bridge.c:2487:fuse_thread_proc]
glusterfs-fuse: fuse_chan_receive() returned -1 (25)
2008-06-09 13:33:09 E [client-protocol.c:3866:client_statfs_cbk]
brick6-5: no proper reply from server, returning ENOTCONN
-------------------------------------------------------------------
In the frozen node, after rebooting, /var/log/glusterfs/glusterfsd.log
states (for the brick6-0 case):
-------------------------------------------------------------------
2008-06-06 21:10:02 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.205:901)
2008-06-06 21:10:24 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.1.1.1:995)
2008-06-06 21:10:25 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.252:997)
2008-06-06 21:10:25 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.251:997)
2008-06-06 21:10:25 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.250:997)
2008-06-06 21:10:26 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.249:996)
2008-06-06 21:10:26 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.248:997)
2008-06-06 21:10:26 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.247:997)
2008-06-06 21:10:26 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.246:997)
2008-06-06 21:10:27 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.245:997)
2008-06-06 21:10:27 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.244:997)
2008-06-06 21:10:27 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.243:997)
2008-06-06 21:10:28 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.242:997)
2008-06-06 21:10:28 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.241:996)
2008-06-06 21:10:28 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.240:997)
2008-06-06 21:10:29 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.239:996)
2008-06-06 21:10:29 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.238:997)
2008-06-06 21:10:29 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.237:997)
2008-06-06 21:10:30 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.236:997)
2008-06-06 21:10:30 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.235:997)
2008-06-06 21:10:30 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.234:997)
2008-06-06 21:10:31 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.233:997)
2008-06-06 21:10:31 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.232:997)
2008-06-06 21:10:31 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.231:997)
2008-06-06 21:10:32 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.230:997)
2008-06-06 21:10:32 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.229:997)
2008-06-06 21:10:32 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.228:997)
2008-06-06 21:10:33 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.227:997)
2008-06-06 21:10:33 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.226:997)
2008-06-06 21:10:33 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.225:997)
2008-06-06 21:10:34 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.224:997)
2008-06-06 21:10:34 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.223:997)
2008-06-06 21:10:34 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.222:997)
2008-06-06 21:10:35 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.221:997)
2008-06-06 21:10:35 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.220:997)
2008-06-06 21:10:35 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.219:996)
2008-06-06 21:10:35 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.218:997)
2008-06-06 21:10:36 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.217:996)
2008-06-06 21:10:36 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (10.255.255.216:997)
-------------------------------------------------------------------
and the client log says:
-------------------------------------------------------------------
2008-06-06 21:10:24 E [protocol.c:271:gf_block_unserialize_transport]
local-ns: EOF from peer (10.1.1.1:6996)
2008-06-06 21:10:24 W [client-protocol.c:4759:client_protocol_cleanup]
local-ns: cleaning up state in transport object 0x58d7f0
2008-06-06 21:10:25 E [protocol.c:271:gf_block_unserialize_transport]
brick0-0: EOF from peer (10.255.255.252:6996)
2008-06-06 21:10:25 W [client-protocol.c:4759:client_protocol_cleanup]
brick0-0: cleaning up state in transport object 0x51dc80
2008-06-06 21:10:25 E [protocol.c:271:gf_block_unserialize_transport]
brick0-1: EOF from peer (10.255.255.251:6996)
2008-06-06 21:10:25 W [client-protocol.c:4759:client_protocol_cleanup]
brick0-1: cleaning up state in transport object 0x522b80
2008-06-06 21:10:25 E [protocol.c:271:gf_block_unserialize_transport]
brick0-2: EOF from peer (10.255.255.250:6996)
2008-06-06 21:10:25 W [client-protocol.c:4759:client_protocol_cleanup]
brick0-2: cleaning up state in transport object 0x5274e0
2008-06-06 21:10:26 E [protocol.c:271:gf_block_unserialize_transport]
brick0-3: EOF from peer (10.255.255.249:6996)
2008-06-06 21:10:26 W [client-protocol.c:4759:client_protocol_cleanup]
brick0-3: cleaning up state in transport object 0x52be40
2008-06-06 21:10:26 E [protocol.c:271:gf_block_unserialize_transport]
brick0-4: EOF from peer (10.255.255.248:6996)
2008-06-06 21:10:26 W [client-protocol.c:4759:client_protocol_cleanup]
brick0-4: cleaning up state in transport object 0x5307a0
2008-06-06 21:10:26 E [protocol.c:271:gf_block_unserialize_transport]
brick0-5: EOF from peer (10.255.255.247:6996)
2008-06-06 21:10:26 W [client-protocol.c:4759:client_protocol_cleanup]
brick0-5: cleaning up state in transport object 0x535100
2008-06-06 21:10:27 E [protocol.c:271:gf_block_unserialize_transport]
brick1-0: EOF from peer (10.255.255.246:6996)
2008-06-06 21:10:27 W [client-protocol.c:4759:client_protocol_cleanup]
brick1-0: cleaning up state in transport object 0x539a60
2008-06-06 21:10:27 E [protocol.c:271:gf_block_unserialize_transport]
brick1-1: EOF from peer (10.255.255.245:6996)
2008-06-06 21:10:27 W [client-protocol.c:4759:client_protocol_cleanup]
brick1-1: cleaning up state in transport object 0x53e3c0
2008-06-06 21:10:27 E [protocol.c:271:gf_block_unserialize_transport]
brick1-2: EOF from peer (10.255.255.244:6996)
2008-06-06 21:10:27 W [client-protocol.c:4759:client_protocol_cleanup]
brick1-2: cleaning up state in transport object 0x542d20
2008-06-06 21:10:28 E [protocol.c:271:gf_block_unserialize_transport]
brick1-3: EOF from peer (10.255.255.243:6996)
2008-06-06 21:10:28 W [client-protocol.c:4759:client_protocol_cleanup]
brick1-3: cleaning up state in transport object 0x547680
2008-06-06 21:10:28 E [tcp-client.c:190:tcp_connect] local-ns:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:28 E [protocol.c:271:gf_block_unserialize_transport]
brick1-4: EOF from peer (10.255.255.242:6996)
2008-06-06 21:10:28 W [client-protocol.c:4759:client_protocol_cleanup]
brick1-4: cleaning up state in transport object 0x54bfe0
2008-06-06 21:10:28 E [tcp-client.c:190:tcp_connect] brick1-4:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:28 E [tcp-client.c:190:tcp_connect] brick0-0:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:28 E [protocol.c:271:gf_block_unserialize_transport]
brick1-5: EOF from peer (10.255.255.241:6996)
2008-06-06 21:10:28 W [client-protocol.c:4759:client_protocol_cleanup]
brick1-5: cleaning up state in transport object 0x550940
2008-06-06 21:10:28 E [tcp-client.c:190:tcp_connect] brick0-1:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:28 E [tcp-client.c:190:tcp_connect] brick1-5:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:28 E [protocol.c:271:gf_block_unserialize_transport]
brick2-0: EOF from peer (10.255.255.240:6996)
2008-06-06 21:10:28 W [client-protocol.c:4759:client_protocol_cleanup]
brick2-0: cleaning up state in transport object 0x5552a0
2008-06-06 21:10:29 E [tcp-client.c:190:tcp_connect] brick0-2:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:29 E [tcp-client.c:190:tcp_connect] brick2-0:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:29 E [protocol.c:271:gf_block_unserialize_transport]
brick2-1: EOF from peer (10.255.255.239:6996)
2008-06-06 21:10:29 W [client-protocol.c:4759:client_protocol_cleanup]
brick2-1: cleaning up state in transport object 0x559c00
2008-06-06 21:10:29 E [tcp-client.c:190:tcp_connect] brick0-3:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:29 E [tcp-client.c:190:tcp_connect] brick2-1:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:29 E [tcp-client.c:190:tcp_connect] brick1-4:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:29 E [protocol.c:271:gf_block_unserialize_transport]
brick2-2: EOF from peer (10.255.255.238:6996)
2008-06-06 21:10:29 W [client-protocol.c:4759:client_protocol_cleanup]
brick2-2: cleaning up state in transport object 0x55e560
2008-06-06 21:10:29 E [tcp-client.c:190:tcp_connect] brick2-2:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:29 E [tcp-client.c:190:tcp_connect] brick0-4:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:29 E [tcp-client.c:190:tcp_connect] brick1-5:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:29 E [protocol.c:271:gf_block_unserialize_transport]
brick2-3: EOF from peer (10.255.255.237:6996)
2008-06-06 21:10:29 W [client-protocol.c:4759:client_protocol_cleanup]
brick2-3: cleaning up state in transport object 0x562ec0
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick2-3:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick0-5:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick2-0:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:30 E [protocol.c:271:gf_block_unserialize_transport]
brick2-4: EOF from peer (10.255.255.236:6996)
2008-06-06 21:10:30 W [client-protocol.c:4759:client_protocol_cleanup]
brick2-4: cleaning up state in transport object 0x567820
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick2-4:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick1-0:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick2-1:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:30 E [protocol.c:271:gf_block_unserialize_transport]
brick2-5: EOF from peer (10.255.255.235:6996)
2008-06-06 21:10:30 W [client-protocol.c:4759:client_protocol_cleanup]
brick2-5: cleaning up state in transport object 0x56c180
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick2-5:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick1-1:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick2-2:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:30 E [protocol.c:271:gf_block_unserialize_transport]
brick3-0: EOF from peer (10.255.255.234:6996)
2008-06-06 21:10:30 W [client-protocol.c:4759:client_protocol_cleanup]
brick3-0: cleaning up state in transport object 0x570ae0
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick1-2:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick3-0:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick2-3:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [protocol.c:271:gf_block_unserialize_transport]
brick3-1: EOF from peer (10.255.255.233:6996)
2008-06-06 21:10:31 W [client-protocol.c:4759:client_protocol_cleanup]
brick3-1: cleaning up state in transport object 0x575440
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] local-ns:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick1-3:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick3-1:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick2-4:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick0-0:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick1-4:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [protocol.c:271:gf_block_unserialize_transport]
brick3-2: EOF from peer (10.255.255.232:6996)
2008-06-06 21:10:31 W [client-protocol.c:4759:client_protocol_cleanup]
brick3-2: cleaning up state in transport object 0x579da0
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick3-2:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick2-5:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick0-1:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick1-5:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [protocol.c:271:gf_block_unserialize_transport]
brick3-3: EOF from peer (10.255.255.231:6996)
2008-06-06 21:10:31 W [client-protocol.c:4759:client_protocol_cleanup]
brick3-3: cleaning up state in transport object 0x57e700
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick3-3:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick3-0:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick0-2:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick2-0:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [protocol.c:271:gf_block_unserialize_transport]
brick3-4: EOF from peer (10.255.255.230:6996)
2008-06-06 21:10:32 W [client-protocol.c:4759:client_protocol_cleanup]
brick3-4: cleaning up state in transport object 0x583060
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick3-4:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick3-1:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick0-3:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick2-1:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [protocol.c:271:gf_block_unserialize_transport]
brick3-5: EOF from peer (10.255.255.229:6996)
2008-06-06 21:10:32 W [client-protocol.c:4759:client_protocol_cleanup]
brick3-5: cleaning up state in transport object 0x5879c0
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick3-5:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick3-2:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick0-4:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick2-2:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [protocol.c:271:gf_block_unserialize_transport]
brick4-0: EOF from peer (10.255.255.228:6996)
2008-06-06 21:10:32 W [client-protocol.c:4759:client_protocol_cleanup]
brick4-0: cleaning up state in transport object 0x520680
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick4-0:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick3-3:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick0-5:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick2-3:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [protocol.c:271:gf_block_unserialize_transport]
brick4-1: EOF from peer (10.255.255.227:6996)
2008-06-06 21:10:33 W [client-protocol.c:4759:client_protocol_cleanup]
brick4-1: cleaning up state in transport object 0x525000
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick4-1:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick3-4:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick1-0:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick2-4:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [protocol.c:271:gf_block_unserialize_transport]
brick4-2: EOF from peer (10.255.255.226:6996)
2008-06-06 21:10:33 W [client-protocol.c:4759:client_protocol_cleanup]
brick4-2: cleaning up state in transport object 0x529960
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick4-2:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick3-5:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick1-1:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick2-5:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [protocol.c:271:gf_block_unserialize_transport]
brick4-3: EOF from peer (10.255.255.225:6996)
2008-06-06 21:10:33 W [client-protocol.c:4759:client_protocol_cleanup]
brick4-3: cleaning up state in transport object 0x52e2c0
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick4-3:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick4-0:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick1-2:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick3-0:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [protocol.c:271:gf_block_unserialize_transport]
brick4-4: EOF from peer (10.255.255.224:6996)
2008-06-06 21:10:34 W [client-protocol.c:4759:client_protocol_cleanup]
brick4-4: cleaning up state in transport object 0x532c20
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick4-4:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick4-1:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick1-3:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick3-1:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [protocol.c:271:gf_block_unserialize_transport]
brick4-5: EOF from peer (10.255.255.223:6996)
2008-06-06 21:10:34 W [client-protocol.c:4759:client_protocol_cleanup]
brick4-5: cleaning up state in transport object 0x537580
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick1-4:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick4-5:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick4-2:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick3-2:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [protocol.c:271:gf_block_unserialize_transport]
brick5-0: EOF from peer (10.255.255.222:6996)
2008-06-06 21:10:34 W [client-protocol.c:4759:client_protocol_cleanup]
brick5-0: cleaning up state in transport object 0x53bee0
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick1-5:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick5-0:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick4-3:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick3-3:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [protocol.c:271:gf_block_unserialize_transport]
brick5-1: EOF from peer (10.255.255.221:6996)
2008-06-06 21:10:35 W [client-protocol.c:4759:client_protocol_cleanup]
brick5-1: cleaning up state in transport object 0x540840
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick2-0:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick5-1:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick4-4:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick3-4:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [protocol.c:271:gf_block_unserialize_transport]
brick5-2: EOF from peer (10.255.255.220:6996)
2008-06-06 21:10:35 W [client-protocol.c:4759:client_protocol_cleanup]
brick5-2: cleaning up state in transport object 0x5451a0
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick2-1:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick5-2:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick4-5:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick3-5:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [protocol.c:271:gf_block_unserialize_transport]
brick5-3: EOF from peer (10.255.255.219:6996)
2008-06-06 21:10:35 W [client-protocol.c:4759:client_protocol_cleanup]
brick5-3: cleaning up state in transport object 0x549b00
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick5-3:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick2-2:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick5-0:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick4-0:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:36 E [protocol.c:271:gf_block_unserialize_transport]
brick5-4: EOF from peer (10.255.255.218:6996)
2008-06-06 21:10:36 W [client-protocol.c:4759:client_protocol_cleanup]
brick5-4: cleaning up state in transport object 0x54e460
2008-06-06 21:10:36 E [tcp-client.c:190:tcp_connect] brick5-4:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:36 E [tcp-client.c:190:tcp_connect] brick2-3:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:36 E [tcp-client.c:190:tcp_connect] brick5-1:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:36 E [tcp-client.c:190:tcp_connect] brick4-1:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:36 E [tcp-client.c:190:tcp_connect] local-ns:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:36 E [protocol.c:271:gf_block_unserialize_transport]
brick5-5: EOF from peer (10.255.255.217:6996)
2008-06-06 21:10:36 W [client-protocol.c:4759:client_protocol_cleanup]
brick5-5: cleaning up state in transport object 0x552dc0
2008-06-06 21:10:36 E [tcp-client.c:190:tcp_connect] brick5-5:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:36 E [tcp-client.c:190:tcp_connect] brick2-4:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:36 E [tcp-client.c:190:tcp_connect] brick0-0:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:36 E [tcp-client.c:190:tcp_connect] brick5-2:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:36 E [tcp-client.c:190:tcp_connect] brick4-2:
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:11:00 W [nufa.c:47:nufa_init] nufa: No option for limit
min-free-disk given, defaulting it to 15
2008-06-06 21:11:00 W [nufa.c:55:nufa_init] nufa: No option for
nufa.refresh-interval given, defaulting it to 30
-------------------------------------------------------------------
It seems to us that gluster still thinks that the frozen node is alive,
at least to some extent, so it does not disregard it as part of the
filesystem. Any ideas on what is happening, and how could we overcome
it? Thanks in advance,
Ricardo Garcia Mayoral
Computational Fluid Mechanics
ETSI Aeronauticos, Universidad Politecnica de Madrid
Pz Cardenal Cisneros 3, 28040 Madrid, Spain.
Phone: (+34) 913363291 Fax: (+34) 913363295
e-mail: ricardo@xxxxxxxxxxxxxxxxxx