Hello, On Fri, 1 Aug 2014 14:23:28 -0400 Chris Kitzmiller wrote: > I have 3 nodes each running a MON and 30 OSDs. Given the HW you list below, that might be a tall order, particular CPU wise in certain situations. What is your OS running off, HDDs or SSDs? The leveldbs, for the MONs in particular, are going to be very active and will need a lot of IOPS on top of the massive logging all these demons will produce. If all of this isn't living on fast SSD(s) you are likely going to have have problems. >When I test my cluster > with either rados bench or with fio via a 10GbE client using RBD I get > great initial speeds >900MBps and I max out my 10GbE links for a while. > Then, something goes wrong the performance falters and the cluster stops > responding all together. I'll see a monitor call for a new election and > then my OSDs mark each other down, they complain that they've been > wrongly marked down, I get slow request warnings of >30 and >60 seconds. > This eventually resolves itself and the cluster recovers but it then > recurs again right away. Sometimes, via fio, I'll get an I/O error and > it will bail. > > The amount of time for the cluster to start acting up varies. Sometimes > it is great for hours, sometimes it fails after 10 seconds. Nothing > significant shows up in dmesg. A snippet from ceph-osd.77.log (for > example) is at: http://pastebin.com/Zb92Ei7a > > I'm not sure why I can run at full speed for a little while or what the > problem is when it stops working. Please help! > Full speed for a moment at least is easy to explain, that would be the journals that can go full blast until they have to write to the actual HDDs. Monitor with atop or iostat what happens when the performance goes to hell, is it a particular OSD that causes this and so forth. > My nodes: > Ubuntu 14.04 - Linux storage3 3.13.0-32-generic #57-Ubuntu SMP > Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux ceph version > 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6) 2 x 6-core Xeon 2620s > 64GB RAM > 30 x 3TB Seagate ST3000DM001-1CH166 These are particular nasty pieces of shit, at least depending on the firmware. Some models/firmware revisions will constantly do load cycles caused by an APM setting that can not be permanently disabled, thus not only exceeding the max load cycle count in a fraction of the expected life time of these disks but also impacting performance of the drive when it happens, up to the point of at least temporarily freezing them. Which would nicely explain what you're seeing. A "smartctl -a" output from one of these would be interesting. > 6 x 128GB Samsung 840 Pro SSD > 1 x Dual port Broadcom NetXtreme II 5771x/578xx 10GbE > Christian -- Christian Balzer Network/Systems Engineer chibi at gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/