Greetings glusterfs users, I have the errors below in /var/log/glusterfs.log. It's not clear, but I'm guessing that this is simply a network error which was handled adequately by the software -- but it is truly not obvious. * Were these network errors were handled by AFR? * Without AFR the application would I have seen a filesystem error (e.g. "Transport endpoint not connected")? (How about if the network error was on the namespace brick?). * Is there a recommended action for errors in the error log - or some other way of ensuring the integrity of the filesystem (like glusterfsck ...) The volume is defined as ... volume u100-node6 type protocol/client option transport-type tcp/client option transport-timeout 10sec option remote-host node6 option remote-subvolume u100-node6 option username dkpaa option password XXXXXXXXASDBH end-volume volume afr4 type cluster/afr subvolumes u100-node7 u100-node6 end-volume volume unify0 type cluster/unify subvolumes afr0 afr1 afr2 afr3 afr4 option namespace u25-node4 option rr.limits.min-free-disk 5% option scheduler rr end-volume Log file says: 2008-11-11 01:57:01 C [client-protocol.c:212:call_bail] u100-node6: bailing transport 2008-11-11 01:57:01 E [client-protocol.c:4834:client_protocol_cleanup] u100-node6: forced unwinding frame type(1) op(14) reply=@0x860d208 2008-11-11 01:57:01 E [client-protocol.c:3254:client_write_cbk] u100-node6: no proper reply from server, returning ENOTCONN 2008-11-11 01:57:01 E [afr.c:2393:afr_writev_cbk] afr4: (path=/backup5/intelligence.local/rdiff-backup-data/increments/home/pcformat/tmp/analog/cache.2008-11-10T01:30:12+02:00.diff.gz child=u100-node6) op_ret=-1 op_errno=107 2008-11-11 02:36:14 C [client-protocol.c:212:call_bail] u100-node6: bailing transport 2008-11-11 02:36:14 E [client-protocol.c:4834:client_protocol_cleanup] u100-node6: forced unwinding frame type(1) op(14) reply=@0x89308a8 2008-11-11 02:36:14 E [client-protocol.c:3254:client_write_cbk] u100-node6: no proper reply from server, returning ENOTCONN 2008-11-11 02:36:14 E [afr.c:2393:afr_writev_cbk] afr4: (path=/backup5/intelligence.local/rdiff-backup-data/increments/home/pcformat/tmp/analog/cache.out.2008-11-10T01:30:12+02:00.diff.gz child=u100-node6) op_ret=-1 op_errno=107 2008-11-11 03:06:34 E [client-protocol.c:1238:client_flush] u100-node6: : returning EBADFD 2008-11-11 03:06:34 E [afr.c:2649:afr_flush_cbk] afr4: (path=/backup5/intelligence.local/rdiff-backup-data/mirror_metadata.2008-11-10T01:30:12+02:00.snapshot.gz child=u100-node6) op_ret=-1 op_errno=77 On the server side, it sees the client going away: 2008-11-11 01:57:01 E [protocol.c:271:gf_block_unserialize_transport] server: EOF from peer (192.168.15.43:1001) 2008-11-11 01:57:01 E [server-protocol.c:186:generic_reply] server: transport_writev failed 2008-11-11 02:36:14 E [protocol.c:271:gf_block_unserialize_transport] server: EOF from peer (192.168.15.43:1021) 2008-11-11 02:36:14 E [server-protocol.c:186:generic_reply] server: transport_writev failed 2008-11-11 07:30:39 E [protocol.c:271:gf_block_unserialize_transport] server: EOF from peer (192.168.15.43:999) 2008-11-11 07:30:39 E [protocol.c:271:gf_block_unserialize_transport] server: EOF from peer (192.168.15.43:1020)