Extremely high load after 100% full bricks

d.a.bretherton at reading.ac.uk (Dan Bretherton) · Mon, 22 Oct 2012 14:03:14 +0100

Dear All-
A replicated pair of servers in my GlusterFS 3.3.0 cluster have been 
experiencing extremely high load for the past few days after a 
replicated brick pair became 100% full.  The GlusterFS related load on 
one of the servers was fluctuating at around 60, and this high load 
would swap to the other server periodically.  When I noticed the full 
bricks I quickly extended the volume by creating new bricks on another 
server, and manually moved some data off the full bricks to create space 
for write operations.  The fix-layout operation seemed to start normally 
but the load then increased even further.  The server with the high load 
(then up to about 80) became very slow to respond and I noticed a lot of 
errors in the VOLNAME-rebalance.log files like the following.

[2012-10-22 00:35:52.070364] W 
[socket.c:1512:__socket_proto_state_machine] 0-atmos-client-10: reading 
from socket failed. Error (Transport endpoint is not connected), peer 
(192.171.166.92:24052)
[2012-10-22 00:35:52.070446] E [rpc-clnt.c:373:saved_frames_unwind] 
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xe7) [0x2b3fd905c547] 
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb2) 
[0x2b3fd905bf42] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) 
[0x2b3fd905bbfe]))) 0-atmos-client-10: forced unwinding frame 
type(GlusterFS 3.1) op(INODELK(29)) called at 2012-10-22 00:35:45.454529 
(xid=0x285951x)

There have also been occasional errors like the following, referring to 
the pair of bricks that became 100% full.

[2012-10-22 01:32:52.827044] W [client3_1-fops.c:5517:client3_1_readdir] 
0-atmos-client-15:  (00000000-0000-0000-0000-000000000000) remote_fd is 
-1. EBADFD
[2012-10-22 09:49:21.103066] W 
[client3_1-fops.c:5628:client3_1_readdirp] 0-atmos-client-14:  
(00000000-0000-0000-0000-000000000000) remote_fd is -1. EBADFD

The log files from the bricks that were 100% full have a lot of these 
errors in, from the period after I freed up some space on them.

[2012-10-22 00:40:56.246075] E [server.c:176:server_submit_reply] 
(-->/usr/lib64/libglusterfs.so.0(default_inodelk_cbk+0xa4) 
[0x361da23e84] 
(-->/usr/lib64/glusterfs/3.3.0/xlator/debug/io-stats.so(io_stats_inodelk_cbk+0xd8) 
[0x2aaaabd74d48] 
(-->/usr/lib64/glusterfs/3.3.0/xlator/protocol/server.so(server_inodelk_cbk+0x10b) 
[0x2aaaabf9742b]))) 0-: Reply submission failed
[2012-10-22 00:40:56.246117] I 
[server-helpers.c:629:server_connection_destroy] 0-atmos-server: 
destroyed connection of 
bdan10.nerc-essc.ac.uk-13609-2012/10/21-23:04:53:323865-atmos-client-15-0

All these errors have only occurred on the replicated pair of servers 
that had suffered from 100% full bricks.  I don't know if the errors are 
being caused by the high load (resulting in poor communication with 
other peers for example) or if the high load is the result of 
replication and/or distribution errors.  I have tried various things to 
bring the load down, including un-mounting the volume and stopping the 
fix-layout operation, but the only thing that works is stopping the 
volume. Obviously I can't do that for long because people need to use 
the data, but with the load as high as it is data access is very slow 
and users are experiencing a lot of temporary I/O errors.   Bricks from 
several volumes are on those servers so everybody in the department is 
being affected by this problem.  I thought at first that the load was 
being caused by self-heal operations fixing errors caused by write 
failures that occurred when the bricks were full, but it is glusterfs 
threads that are causing the high load, not glustershd.

Can anyone suggest a way to bring the load down so people can access the 
data properly again?  Also, can I trust GlusterFS to eventually 
self-heal the errors causing the above error messages?

Regards,
-Dan.