On Aug 4, 2014, at 10:53 PM, Christian Balzer wrote: > On Mon, 4 Aug 2014 15:11:39 -0400 Chris Kitzmiller wrote: >> On Aug 2, 2014, at 12:03 AM, Christian Balzer wrote: >>> On Fri, 1 Aug 2014 14:23:28 -0400 Chris Kitzmiller wrote: >>>> I have 3 nodes each running a MON and 30 OSDs. >>>> ... >>>> When I test my cluster >>>> with either rados bench or with fio via a 10GbE client using RBD I get >>>> great initial speeds >900MBps and I max out my 10GbE links for a >>>> while. Then, something goes wrong the performance falters and the >>>> cluster stops responding all together. I'll see a monitor call for a >>>> new election and then my OSDs mark each other down, they complain >>>> that they've been wrongly marked down, I get slow request warnings of >>>> 30 and >60 seconds. This eventually resolves itself and the cluster >>>> recovers but it then recurs again right away. Sometimes, via fio, I'll get >>>> an I/O error and it will bail. This appears to still be happening. :(