RE: poor write performance

James Harper <james.harper@xxxxxxxxxxxxxxxx> · Mon, 22 Apr 2013 05:32:38 +0000

> 
> On 04/19/2013 08:30 PM, James Harper wrote:
> >>> rados -p <pool> -b 4096 bench 300 seq -t 64
> >>
> >> sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
> >>       0       0         0         0         0         0         -         0
> >> read got -2
> >> error during benchmark: -5
> >> error 5: (5) Input/output error
> >>
> >> not sure what that's about...
> >>
> >
> > Oops... I typo'd --no-cleanup. Now I get:
> >
> >     sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
> >       0       0         0         0         0         0         -         0
> >   Total time run:        0.243709
> > Total reads made:     1292
> > Read size:            4096
> > Bandwidth (MB/sec):    20.709
> >
> > Average Latency:       0.0118838
> > Max latency:           0.031942
> > Min latency:           0.001445
> >
> > So it finishes instantly without seeming to do much actual testing...
> 
> My bad.  I forgot to tell you to do a sync/flush on the OSDs after the
> write test.  All of those reads are probably coming from pagecache.  The
> good news is that this is demonstrating that reading 4k objects from
> pagecache isn't insanely bad on your setup (for larger sustained loads I
> see 4k object reads from pagecache hit up to around 100MB/s with
> multiple clients on my test nodes).
> 
> On your OSD nodes try:
> 
> sync
> echo 3 > /proc/sys/vm/drop_caches
> 
> right before you run the read test.
> 

I tell it to test for 300 seconds and it tests for 0 seconds so I must be doing something else wrong.

> Whatever issue you are facing is probably down at the filestore level or
> possible lower down yet.
> 
> How do your drives benchmark with something like fio doing random 4k
> writes?  Are your drives dedicated for ceph?  What filesystem?  Also
> what is the journal device you are using?
> 

Drives are dedicated for ceph. I originally put my journals on /, but that was ext3 and my throughput went down even further so the journal shares the osd disk for now.

I upgraded to 0.60 and that seems to have made a big difference. If I kill off one of my OSD's I get around 20MB/second throughput in live testing (test restore of Xen Windows VM from USB backup), which is pretty much the limit of the USB disk. If I reactivate the second OSD throughput drops back to ~10MB/second which isn't as good but is much better than I was getting.

Thanks

James

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html