On Thu, Aug 22, 2013 at 2:23 PM, Oliver Daudey <oliver@xxxxxxxxx> wrote: > Hey Greg, > > I encountered a similar problem and we're just in the process of > tracking it down here on the list. Try downgrading your OSD-binaries to > 0.61.8 Cuttlefish and re-test. If it's significantly faster on RBD, > you're probably experiencing the same problem I have with Dumpling. > > PS: Only downgrade your OSDs. Cuttlefish-monitors don't seem to want to > start with a database that has been touched by a Dumpling-monitor and > don't talk to them, either. > > PPS: I've also had OSDs no longer start with an assert while processing > the journal during these upgrade/downgrade-tests, mostly when coming > down from Dumpling to Cuttlefish. If you encounter those, delete your > journal and re-create with `ceph-osd -i <OSD-ID> --mkjournal'. Your > data-store will be OK, as far as I can tell. Careful — deleting the journal is potentially throwing away updates to your data store! If this is a problem you should flush the journal with the dumpling binary before downgrading. > > > Regards, > > Oliver > > On do, 2013-08-22 at 10:55 -0700, Greg Poirier wrote: >> I have been benchmarking our Ceph installation for the last week or >> so, and I've come across an issue that I'm having some difficulty >> with. >> >> >> Ceph bench reports reasonable write throughput at the OSD level: >> >> >> ceph tell osd.0 bench >> { "bytes_written": 1073741824, >> "blocksize": 4194304, >> "bytes_per_sec": "47288267.000000"} >> >> >> Running this across all OSDs produces on average 50-55 MB/s, which is >> fine with us. We were expecting around 100 MB/s / 2 (journal and OSD >> on same disk, separate partitions). >> >> >> What I wasn't expecting was the following: >> >> >> I tested 1, 2, 4, 8, 16, 24, and 32 VMSs simultaneously writing >> against 33 OSDs. Aggregate write throughput peaked under 400 MB/s: >> >> >> 1 196.013671875 >> 2 285.8759765625 >> 4 351.9169921875 >> 8 386.455078125 >> 16 363.8583984375 >> 24 353.6298828125 >> 32 348.9697265625 >> >> >> >> I was hoping to see something closer to # OSDs * Average value for >> ceph bench (approximately 1.2 GB/s peak aggregate write throughput). >> >> >> We're seeing excellent read, randread performance, but writes are a >> bit of a bother. >> >> >> Does anyone have any suggestions? You don't appear to have accounted for the 2x replication (where all writes go to two OSDs) in these calculations. I assume your pools have size 2 (or 3?) for these tests. 3 would explain the performance difference entirely; 2x replication leaves it still a bit low but takes the difference down to ~350/600 instead of ~350/1200. :) You mentioned that your average osd bench throughput was ~50MB/s; what's the range? Have you run any rados bench tests? What is your PG count across the cluster? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com