returning EBADFD / no proper reply from server, returning ENOTCONN

list2008 at lunch.za.net (Andrew McGill) · Tue, 11 Nov 2008 10:59:52 +0200

Greetings glusterfs users,

I have the errors below in /var/log/glusterfs.log.  It's not clear, but I'm 
guessing that this is simply a network error which was handled adequately by 
the software -- but it is truly not obvious.  

 * Were these network errors were handled by AFR?  

 * Without AFR the application would I have seen a filesystem error 
(e.g. "Transport endpoint not connected")?  (How about if the network error 
was on the namespace brick?).

 * Is there a recommended action for errors in the error log - or some other 
way of ensuring the integrity of the filesystem (like glusterfsck ...)

The volume is defined as ...

volume u100-node6
 type protocol/client
 option transport-type tcp/client
 option transport-timeout 10sec
 option remote-host node6
 option remote-subvolume u100-node6
 option username dkpaa
 option password XXXXXXXXASDBH
end-volume

volume afr4
  type cluster/afr
  subvolumes u100-node7 u100-node6
end-volume

volume unify0
  type cluster/unify
  subvolumes afr0 afr1 afr2 afr3 afr4
  option namespace u25-node4
  option rr.limits.min-free-disk 5%
  option scheduler rr
end-volume

Log file says:

2008-11-11 01:57:01 C [client-protocol.c:212:call_bail] u100-node6: bailing 
transport
2008-11-11 01:57:01 E [client-protocol.c:4834:client_protocol_cleanup] 
u100-node6: forced unwinding frame type(1) op(14) reply=@0x860d208
2008-11-11 01:57:01 E [client-protocol.c:3254:client_write_cbk] u100-node6: no 
proper reply from server, returning ENOTCONN
2008-11-11 01:57:01 E [afr.c:2393:afr_writev_cbk] afr4: 
(path=/backup5/intelligence.local/rdiff-backup-data/increments/home/pcformat/tmp/analog/cache.2008-11-10T01:30:12+02:00.diff.gz 
child=u100-node6) op_ret=-1 op_errno=107

2008-11-11 02:36:14 C [client-protocol.c:212:call_bail] u100-node6: bailing 
transport
2008-11-11 02:36:14 E [client-protocol.c:4834:client_protocol_cleanup] 
u100-node6: forced unwinding frame type(1) op(14) reply=@0x89308a8
2008-11-11 02:36:14 E [client-protocol.c:3254:client_write_cbk] u100-node6: no 
proper reply from server, returning ENOTCONN
2008-11-11 02:36:14 E [afr.c:2393:afr_writev_cbk] afr4: 
(path=/backup5/intelligence.local/rdiff-backup-data/increments/home/pcformat/tmp/analog/cache.out.2008-11-10T01:30:12+02:00.diff.gz 
child=u100-node6) op_ret=-1 op_errno=107

2008-11-11 03:06:34 E [client-protocol.c:1238:client_flush] u100-node6: : 
returning EBADFD
2008-11-11 03:06:34 E [afr.c:2649:afr_flush_cbk] afr4: 
(path=/backup5/intelligence.local/rdiff-backup-data/mirror_metadata.2008-11-10T01:30:12+02:00.snapshot.gz 
child=u100-node6) op_ret=-1 op_errno=77

On the server side, it sees the client going away:

2008-11-11 01:57:01 E [protocol.c:271:gf_block_unserialize_transport] server: 
EOF from peer (192.168.15.43:1001)
2008-11-11 01:57:01 E [server-protocol.c:186:generic_reply] server: 
transport_writev failed

2008-11-11 02:36:14 E [protocol.c:271:gf_block_unserialize_transport] server: 
EOF from peer (192.168.15.43:1021)
2008-11-11 02:36:14 E [server-protocol.c:186:generic_reply] server: 
transport_writev failed

2008-11-11 07:30:39 E [protocol.c:271:gf_block_unserialize_transport] server: 
EOF from peer (192.168.15.43:999)
2008-11-11 07:30:39 E [protocol.c:271:gf_block_unserialize_transport] server: 
EOF from peer (192.168.15.43:1020)