On Mon, Dec 1, 2014 at 1:09 PM, Dan Van Der Ster <daniel.vanderster@xxxxxxx> wrote: > Hi Ilya, > >> On 28 Nov 2014, at 17:56, Ilya Dryomov <ilya.dryomov@xxxxxxxxxxx> wrote: >> >> On Fri, Nov 28, 2014 at 5:46 PM, Dan Van Der Ster >> <daniel.vanderster@xxxxxxx> wrote: >>> Hi Andrei, >>> Yes, I’m testing from within the guest. >>> >>> Here is an example. First, I do 2MB reads when the max_sectors_kb=512, and >>> we see the reads are split into 4. (fio sees 25 iops, though iostat reports >>> 100 smaller iops): >>> >>> # echo 512 > /sys/block/vdb/queue/max_sectors_kb # this is the default >>> # fio --readonly --name /dev/vdb --rw=read --size=1G --ioengine=libaio >>> --direct=1 --runtime=10s --blocksize=2m >>> /dev/vdb: (g=0): rw=read, bs=2M-2M/2M-2M/2M-2M, ioengine=libaio, iodepth=1 >>> fio-2.0.13 >>> Starting 1 process >>> Jobs: 1 (f=1): [R] [100.0% done] [51200K/0K/0K /s] [25 /0 /0 iops] [eta >>> 00m:00s] >>> >>> meanwhile iostat is reporting 100 iops of average size 1024 sectors (i.e. >>> 512kB): >>> >>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz >>> avgqu-sz await svctm %util >>> vdb 0.00 0.00 100.00 0.00 50.00 0.00 1024.00 >>> 3.02 30.25 10.00 100.00 >>> >>> >>> >>> Now increase the max_sectors_kb to 4MB, and the IOs are no longer split: >>> >>> # echo 4096 > /sys/block/vdb/queue/max_sectors_kb >>> # fio --readonly --name /dev/vdb --rw=read --size=1G --ioengine=libaio >>> --direct=1 --runtime=10s --blocksize=2m >>> /dev/vdb: (g=0): rw=read, bs=2M-2M/2M-2M/2M-2M, ioengine=libaio, iodepth=1 >>> fio-2.0.13 >>> Starting 1 process >>> Jobs: 1 (f=1): [R] [100.0% done] [200.0M/0K/0K /s] [100 /0 /0 iops] [eta >>> 00m:00s] >>> >>> iostat reports 100 iops, 4096 sectors each read (i.e. 2MB): >>> >>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz >>> avgqu-sz await svctm %util >>> vdb 300.00 0.00 100.00 0.00 200.00 0.00 4096.00 >>> 0.99 9.94 9.94 99.40 >> >> We set the hard request size limit to rbd object size (4M typically) >> >> blk_queue_max_hw_sectors(q, segment_size / SECTOR_SIZE); >> > > Are you referring to librbd or krbd? My observations are limited to librbd at the moment. (I didn’t try this on krbd). Yes, I was referring to krbd. But it looks like that patch from Christoph will change this for qemu+librbd as well - an artificial soft limit imposed by the VM kernel will disappear. CC'ing Josh. > >> but block layer then sets the soft limit for fs requests to 512K >> >> BLK_DEF_MAX_SECTORS = 1024, >> >> limits->max_sectors = min_t(unsigned int, max_hw_sectors, >> BLK_DEF_MAX_SECTORS); >> >> which you are supposed to change on a per-device basis via sysfs. We >> could probably raise the soft limit to rbd object size by default as >> well - I don't see any harm in that. >> > > Indeed, this patch which was being targeted for 3.19: > > https://lkml.org/lkml/2014/9/6/123 Oh good, I was just about to send a patch for krbd. Thanks, Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com