Hi, I found a somewhat frustrating test result after the weekend. I startet a bonnie on four different clients (so a total of four bonnies in parallel). I have two servers, each two partitions, wich are unifed and afred "over cross", so each server has a brick and a mirrored brick of the other, using tla patch-628. For one, the results seem to be not too promising, as it more than 48 hours hours to complete. Doing a bonnie on only one client took "only" about 12 hours (unfortunately, I don't have exact numbers about the runtime). But even worse, two of the bonnies didn't finish at all. The first client dropped out after approx. 8 hours, claiming "Can't open file ./Bonnie.17791.001". However, the file is (partly) there, also on the afr-mirror, but with different sizes. The log suggests that it was a timeout problem (if I interpret it correctly): 2008-01-06 03:48:10 E [afr.c:3364:afr_close_setxattr_cbk] afr1: (path=/Bonnie.17791.027 child=fsc1) op_ret=-1 op_errno=28 2008-01-06 03:50:34 W [client-protocol.c:209:call_bail] ns1: activating bail-out. pending frames = 1. last sent = 2008-01-06 03:48:17 . last received = 2008-01-06 03:48:17 transport-timeout = 108 2008-01-06 03:50:34 C [client-protocol.c:217:call_bail] ns1: bailing transport 2008-01-06 03:50:34 W [client-protocol.c:4490:client_protocol_cleanup] ns1: cleaning up state in transport object 0x522e40 2008-01-06 03:50:34 E [client-protocol.c:4542:client_protocol_cleanup] ns1: forced unwinding frame type(1) op(5) reply=@0x2aaaab407a0 0 2008-01-06 03:50:34 E [afr.c:2573:afr_selfheal_lock_cbk] afrns: (path=/Bonnie.17791.001 child=ns1) op_ret=-1 op_errno=107 2008-01-06 03:50:34 E [afr.c:2744:afr_open] afrns: self heal failed, returning EIO 2008-01-06 03:50:34 C [tcp.c:81:tcp_disconnect] ns1: connection disconnected 2008-01-06 03:51:00 E [afr.c:1907:afr_selfheal_sync_file_writev_cbk] afr1: (path=/Bonnie.17791.001 child=fsc1) op_ret=-1 op_errno=28 2008-01-06 03:51:00 E [afr.c:1693:afr_error_during_sync] afr1: error during self-heal 2008-01-06 03:51:03 E [afr.c:2744:afr_open] afr1: self heal failed, returning EIO 2008-01-06 03:51:03 E [fuse-bridge.c:670:fuse_fd_cbk] glusterfs-fuse: 12276158: /Bonnie.17791.001 => -1 (5) 2008-01-07 04:40:17 E [fuse-bridge.c:431:fuse_entry_cbk] glusterfs-fuse: 15841600: /Bonnie.26672.026 => -1 (2) The second had a problem in creating / removing a dir: Create files in sequential order...Can't make directory ./Bonnie.26672 Cleaning up test directory after error. Bonnie: drastic I/O error (rmdir): No such file or directory On this client, there is nothing found in the logs. For both cases, nothing is in the server logs either (both server and clients had no special debug level enabled). No, the million dollar question is, how would I debug this situation, preferably a bit quicker than 48 hours... Thanks, Sascha