Re: read performance not perfect

huang jun <hjwsm1989@xxxxxxxxx> · Tue, 9 Aug 2011 11:56:45 +0800

hi,sage
we have a test recently,use 5 OSDs on v0.30, OS is linux-2.6.39
the read speed increased to 79MB/s at first read,
and avg ups to 85MB/s~90MB/s, about two times of our former test ,it
promotes read performance very much.
but we don't know whether it lives up to your expectations.

2011/8/4 Sage Weil <sage@xxxxxxxxxxxx>:
> Hi,
>
> I've just pushed a wip-readahead branch to ceph-client.git that rewrites
> ceph_readpages (used for readahead) to be fully asynchronous.  This should
> let us take full advantage of whatever the readahead window is.  I'm still
> doing some testing on this end, but things look good so far.
>
> There are two relevant mount options:
>
>  rasize=NN    - max readahead window size (bytes)
>  rsize=MM     - max read size
>
> rsize defaults to 0 (no limit), which means it effectively maxes out at
> the stripe size (one object, 4MB by default).
>
> rasize now defaults to 8 MB.  This is probably what you'll want to
> experiment with.  In practice I think something on the order of 8-12 MB
> will be best, as it will start loading things of disk ~2 objects ahead of
> the current position.
>
> Can you give it a go and see if this helps in your environment?
>
> Thanks!
> sage
>
>
> On Tue, 19 Jul 2011, huang jun wrote:
>> thanks for you reply
>> now we find two points confused us:
>> 1) the kernel client execute sequence read though aio_read function,
>> but from OSD log,
>>    the dispatch_queue length in OSD is always 0, it means OSD can't
>> got next READ message until client send to it. It seems that
>> async_read changes to sync_read, OSD can't parallely read data, so can
>> not make the most of  resources.What are the original purposes when
>> you design this part? perfect realiablity?
>
> Right.  The old ceph_readpages was synhronous, which slowed things down in
> a couple of different ways.
>
>> 2) In singleness read circumstance,during OSD read data from it disk,
>> the OSD doesn't do anything but to wait it finish.We think it was the
>> result of 1), OSD have nothing to do,so just to wait.
>>
>>
>> 2011/7/19 Sage Weil <sage@xxxxxxxxxxxx>:
>> > On Mon, 18 Jul 2011, huang jun wrote:
>> >> hi,all
>> >> We test ceph's read performance last week, and find something weird
>> >> we use ceph v0.30 on linux 2.6.37
>> >> mount ceph on back-platform consist of 2 osds \1 mon \1 mds
>> >> $mount -t ceph 192.168.1.103:/ /mnt -vv
>> >> $ dd if=/dev/zero of=/mnt/test bs=4M count=200
>> >> $ cd .. && umount /mnt
>> >> $mount -t ceph 192.168.1.103:/ /mnt -vv
>> >> $dd if=test of=/dev/zero bs=4M
>> >>   200+0 records in
>> >>   200+0 records out
>> >>   838860800 bytes (839 MB) copied, 16.2327 s, 51.7 MB/s
>> >> but if we use rados to test it
>> >> $ rados -m 192.168.1.103:6789 -p data bench 60 write
>> >> $ rados -m 192.168.1.103:6789 -p data bench 60 seq
>> >>   the result is:
>> >>   Total time run:        24.733935
>> >>   Total reads made:     438
>> >>   Read size:            4194304
>> >>   Bandwidth (MB/sec):    70.834
>> >>
>> >>   Average Latency:       0.899429
>> >>   Max latency:           1.85106
>> >>   Min latency:           0.128017
>> >> this phenomenon attracts our attention, then we begin to analysis the
>> >> osd debug log.
>> >> we find that :
>> >> 1) the kernel client send READ request, at first it requests 1MB, and
>> >> after that it is 512KB
>> >> 2) from rados test cmd log, OSD recept the READ op with 4MB data to handle
>> >> we know the ceph developers pay their attention to read and write
>> >> performance, so i just want to confrim that
>> >> if the communication between the client and OSD spend  more time than
>> >> it should be? can we request  bigger size, just like default object
>> >> size 4MB, when it occurs to READ operation? or this is related to OS
>> >> management, if so, what can we do to promote the performance?
>> >
>> > I think it's related to the way the Linux VFS is doing readahead, and how
>> > the ceph fs code is handling it.  It's issue #1122 in the tracker and I
>> > plan to look at it today or tomorrow!
>> >
>> > Thanks-
>> > sage
>> >
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html