Ceph runs great then falters

chibi@xxxxxxx (Christian Balzer) · Sat, 2 Aug 2014 13:03:09 +0900

Hello,

On Fri, 1 Aug 2014 14:23:28 -0400 Chris Kitzmiller wrote:

> I have 3 nodes each running a MON and 30 OSDs. 

Given the HW you list below, that might be a tall order, particular CPU
wise in certain situations.

What is your OS running off, HDDs or SSDs? 
The leveldbs, for the MONs in particular, are going to be very active and
will need a lot of IOPS on top of the massive logging all these demons
will produce. 
If all of this isn't living on fast SSD(s) you are likely going to have
have problems.

>When I test my cluster
> with either rados bench or with fio via a 10GbE client using RBD I get
> great initial speeds >900MBps and I max out my 10GbE links for a while.
> Then, something goes wrong the performance falters and the cluster stops
> responding all together. I'll see a monitor call for a new election and
> then my OSDs mark each other down, they complain that they've been
> wrongly marked down, I get slow request warnings of >30 and >60 seconds.
> This eventually resolves itself and the cluster recovers but it then
> recurs again right away. Sometimes, via fio, I'll get an I/O error and
> it will bail.
> 
> The amount of time for the cluster to start acting up varies. Sometimes
> it is great for hours, sometimes it fails after 10 seconds. Nothing
> significant shows up in dmesg. A snippet from ceph-osd.77.log (for
> example) is at: http://pastebin.com/Zb92Ei7a
> 
> I'm not sure why I can run at full speed for a little while or what the
> problem is when it stops working. Please help!
> 
Full speed for a moment at least is easy to explain, that would be the
journals that can go full blast until they have to write to the actual
HDDs.

Monitor with atop or iostat what happens when the performance goes to
hell, is it a particular OSD that causes this and so forth.

> My nodes:
> 	Ubuntu 14.04 - Linux storage3 3.13.0-32-generic #57-Ubuntu SMP
> Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux ceph version
> 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6) 2 x 6-core Xeon 2620s
> 	64GB RAM
> 	30 x 3TB Seagate ST3000DM001-1CH166

These are particular nasty pieces of shit, at least depending on the
firmware.  
Some models/firmware revisions will constantly do load cycles caused by
an APM setting that can not be permanently disabled, thus not only
exceeding the max load cycle count in a fraction of the expected life time
of these disks but also impacting performance of the drive when it
happens, up to the point of at least temporarily freezing them. 
Which would nicely explain what you're seeing.

A "smartctl -a" output from one of these would be interesting.

> 	6 x 128GB Samsung 840 Pro SSD
> 	1 x Dual port Broadcom NetXtreme II 5771x/578xx 10GbE
>

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi at gol.com   	Global OnLine Japan/Fusion Communications
http://www.gol.com/