On Thu, Aug 22, 2013 at 5:23 PM, Greg Poirier <greg.poirier@xxxxxxxxxx> wrote: > On Thu, Aug 22, 2013 at 2:34 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> >> You don't appear to have accounted for the 2x replication (where all >> writes go to two OSDs) in these calculations. I assume your pools have > > > Ah. Right. So I should then be looking at: > > # OSDs * Throughput per disk / 2 / repl factor ? > > Which makes 300-400 MB/s aggregate throughput actually sort of reasonable. > >> >> size 2 (or 3?) for these tests. 3 would explain the performance >> difference entirely; 2x replication leaves it still a bit low but >> takes the difference down to ~350/600 instead of ~350/1200. :) > > > Yeah. We're doing 2x repl now, and haven't yet made the decision if we're > going to move to 3x repl or not. > >> >> You mentioned that your average osd bench throughput was ~50MB/s; >> what's the range? > > > 41.9 - 54.7 MB/s > > The actual average is 47.1 MB/s Okay. It's important to realize that because Ceph distributes data pseudorandomly, each OSD is going to end up with about the same amount of data going to it. If one of your drives is slower than the others, the fast ones can get backed up waiting on the slow one to acknowledge writes, so they end up impacting the cluster throughput a disproportionate amount. :( Anyway, I'm guessing you have 24 OSDs from your math earlier? 47MB/s * 24 / 2 = 564MB/s 41MB/s * 24 / 2 = 492MB/s So taking out or reducing the weight on the slow ones might improve things a little. But that's still quite a ways off from what you're seeing — there are a lot of things that could be impacting this but there's probably something fairly obvious with that much of a gap. What is the exact benchmark you're running? What do your nodes look like? -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com