Extremely high CPU load on one server

d.a.bretherton at reading.ac.uk (Dan Bretherton) · Thu, 31 Jan 2013 15:02:25 +0000

Dear All-
I originally asked this question in another thread, but as it is a 
separate problem I thought it deserved its own thread.

I had to extend two volumes recently and the layout fixes for both of 
them are now running at the same time.  One server now has a load of 
over 70 most of the time (mostly glusterfsd), but none of the others 
seem to be particularly busy.  I restarted the server in question but 
the CPU load quickly went up to 70 again.  I can't see any particular 
reason why this one server should be so badly affected by the layout 
fixing processes.  It isn't a particularly big server, with only five 
3TB bricks involved in the two volumes that were extended.  One 
possibility is that a lot of batch jobs on our compute cluster are 
accessing the same set of files that just happen to be on this one 
server, which could potentially happen because I have not been able to 
rebalance the files in the storage volumes for a long time after the 
last attempt resulted in data corruption (in GlusterFS version 3.2).  
However, poorly distributed files certainly isn't the whole story, 
because even when there is not much running on the compute cluster the 
storage server load remains extremely high.

Can anyone suggest a way to troubleshoot this problem?  The rebalance 
logs don't show anything unusual but glustershd.log has a lot of 
metadata split-brain warnings.   The brick logs are full of scary 
looking warnings but none flagged 'E' or 'C'.  The trouble is that I see 
messages like these on all the servers, and I can find nothing unusual 
about the server with a CPU load of 70.  Users are complaining about 
very poor performance, which has been going on or several weeks, so I 
must at least find a work-around that allows people to work normally.

-Dan.

-- 
Mr. D.A. Bretherton
Computer System Manager
Environmental Systems Science Centre (ESSC)
Department of Meteorology
Harry Pitt Building
3 Earley Gate
University of Reading
Reading, RG6 7BE (or RG6 6AL for postal service deliveries)
UK
Tel. +44 118 378 5205, Fax: +44 118 378 6413