Sascha, the logs say op_errno=28, which is ENOSPC (no space left on device). were you aware of that already? avati 2008/1/7, Sascha Ottolski <ottolski@xxxxxx>: > > Hi, > > I found a somewhat frustrating test result after the weekend. I startet a > bonnie on four different clients (so a total of four bonnies in parallel). > I > have two servers, each two partitions, wich are unifed and afred "over > cross", so each server has a brick and a mirrored brick of the other, > using > tla patch-628. > > For one, the results seem to be not too promising, as it more than 48 > hours > hours to complete. Doing a bonnie on only one client took "only" about 12 > hours (unfortunately, I don't have exact numbers about the runtime). > > But even worse, two of the bonnies didn't finish at all. The first client > dropped out after approx. 8 hours, claiming "Can't open > file ./Bonnie.17791.001". However, the file is (partly) there, also on the > afr-mirror, but with different sizes. The log suggests that it was a > timeout > problem (if I interpret it correctly): > > 2008-01-06 03:48:10 E [afr.c:3364:afr_close_setxattr_cbk] afr1: > (path=/Bonnie.17791.027 child=fsc1) op_ret=-1 op_errno=28 > 2008-01-06 03:50:34 W [client-protocol.c:209:call_bail] ns1: activating > bail-out. pending frames = 1. last sent = 2008-01-06 03:48:17 > . last received = 2008-01-06 03:48:17 transport-timeout = 108 > 2008-01-06 03:50:34 C [client-protocol.c:217:call_bail] ns1: bailing > transport > 2008-01-06 03:50:34 W [client-protocol.c:4490:client_protocol_cleanup] > ns1: > cleaning up state in transport object 0x522e40 > 2008-01-06 03:50:34 E [client-protocol.c:4542:client_protocol_cleanup] > ns1: > forced unwinding frame type(1) op(5) reply=@0x2aaaab407a0 > 0 > 2008-01-06 03:50:34 E [afr.c:2573:afr_selfheal_lock_cbk] afrns: > (path=/Bonnie.17791.001 child=ns1) op_ret=-1 op_errno=107 > 2008-01-06 03:50:34 E [afr.c:2744:afr_open] afrns: self heal failed, > returning > EIO > 2008-01-06 03:50:34 C [tcp.c:81:tcp_disconnect] ns1: connection > disconnected > 2008-01-06 03:51:00 E [afr.c:1907:afr_selfheal_sync_file_writev_cbk] afr1: > (path=/Bonnie.17791.001 child=fsc1) op_ret=-1 op_errno=28 > 2008-01-06 03:51:00 E [afr.c:1693:afr_error_during_sync] afr1: error > during > self-heal > 2008-01-06 03:51:03 E [afr.c:2744:afr_open] afr1: self heal failed, > returning > EIO > 2008-01-06 03:51:03 E [fuse-bridge.c:670:fuse_fd_cbk] glusterfs-fuse: > 12276158: /Bonnie.17791.001 => -1 (5) > 2008-01-07 04:40:17 E [fuse-bridge.c:431:fuse_entry_cbk] glusterfs-fuse: > 15841600: /Bonnie.26672.026 => -1 (2) > > > The second had a problem in creating / removing a dir: > > Create files in sequential order...Can't make directory ./Bonnie.26672 > Cleaning up test directory after error. > Bonnie: drastic I/O error (rmdir): No such file or directory > > On this client, there is nothing found in the logs. For both cases, > nothing is > in the server logs either (both server and clients had no special debug > level > enabled). > > No, the million dollar question is, how would I debug this situation, > preferably a bit quicker than 48 hours... > > > Thanks, > > Sascha > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel > -- If I traveled to the end of the rainbow As Dame Fortune did intend, Murphy would be there to tell me The pot's at the other end.