hi,sage we have a test recently,use 5 OSDs on v0.30, OS is linux-2.6.39 the read speed increased to 79MB/s at first read, and avg ups to 85MB/s~90MB/s, about two times of our former test ,it promotes read performance very much. but we don't know whether it lives up to your expectations. 2011/8/4 Sage Weil <sage@xxxxxxxxxxxx>: > Hi, > > I've just pushed a wip-readahead branch to ceph-client.git that rewrites > ceph_readpages (used for readahead) to be fully asynchronous. This should > let us take full advantage of whatever the readahead window is. I'm still > doing some testing on this end, but things look good so far. > > There are two relevant mount options: > > rasize=NN - max readahead window size (bytes) > rsize=MM - max read size > > rsize defaults to 0 (no limit), which means it effectively maxes out at > the stripe size (one object, 4MB by default). > > rasize now defaults to 8 MB. This is probably what you'll want to > experiment with. In practice I think something on the order of 8-12 MB > will be best, as it will start loading things of disk ~2 objects ahead of > the current position. > > Can you give it a go and see if this helps in your environment? > > Thanks! > sage > > > On Tue, 19 Jul 2011, huang jun wrote: >> thanks for you reply >> now we find two points confused us: >> 1) the kernel client execute sequence read though aio_read function, >> but from OSD log, >> the dispatch_queue length in OSD is always 0, it means OSD can't >> got next READ message until client send to it. It seems that >> async_read changes to sync_read, OSD can't parallely read data, so can >> not make the most of resources.What are the original purposes when >> you design this part? perfect realiablity? > > Right. The old ceph_readpages was synhronous, which slowed things down in > a couple of different ways. > >> 2) In singleness read circumstance,during OSD read data from it disk, >> the OSD doesn't do anything but to wait it finish.We think it was the >> result of 1), OSD have nothing to do,so just to wait. >> >> >> 2011/7/19 Sage Weil <sage@xxxxxxxxxxxx>: >> > On Mon, 18 Jul 2011, huang jun wrote: >> >> hi,all >> >> We test ceph's read performance last week, and find something weird >> >> we use ceph v0.30 on linux 2.6.37 >> >> mount ceph on back-platform consist of 2 osds \1 mon \1 mds >> >> $mount -t ceph 192.168.1.103:/ /mnt -vv >> >> $ dd if=/dev/zero of=/mnt/test bs=4M count=200 >> >> $ cd .. && umount /mnt >> >> $mount -t ceph 192.168.1.103:/ /mnt -vv >> >> $dd if=test of=/dev/zero bs=4M >> >> 200+0 records in >> >> 200+0 records out >> >> 838860800 bytes (839 MB) copied, 16.2327 s, 51.7 MB/s >> >> but if we use rados to test it >> >> $ rados -m 192.168.1.103:6789 -p data bench 60 write >> >> $ rados -m 192.168.1.103:6789 -p data bench 60 seq >> >> the result is: >> >> Total time run: 24.733935 >> >> Total reads made: 438 >> >> Read size: 4194304 >> >> Bandwidth (MB/sec): 70.834 >> >> >> >> Average Latency: 0.899429 >> >> Max latency: 1.85106 >> >> Min latency: 0.128017 >> >> this phenomenon attracts our attention, then we begin to analysis the >> >> osd debug log. >> >> we find that : >> >> 1) the kernel client send READ request, at first it requests 1MB, and >> >> after that it is 512KB >> >> 2) from rados test cmd log, OSD recept the READ op with 4MB data to handle >> >> we know the ceph developers pay their attention to read and write >> >> performance, so i just want to confrim that >> >> if the communication between the client and OSD spend more time than >> >> it should be? can we request bigger size, just like default object >> >> size 4MB, when it occurs to READ operation? or this is related to OS >> >> management, if so, what can we do to promote the performance? >> > >> > I think it's related to the way the Linux VFS is doing readahead, and how >> > the ceph fs code is handling it. It's issue #1122 in the tracker and I >> > plan to look at it today or tomorrow! >> > >> > Thanks- >> > sage >> > >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html