Re: large reads become 512 kbyte reads on qemu-kvm rbd

Dan Van Der Ster <daniel.vanderster@xxxxxxx> · Mon, 1 Dec 2014 10:09:03 +0000

Hi Ilya,

> On 28 Nov 2014, at 17:56, Ilya Dryomov <ilya.dryomov@xxxxxxxxxxx> wrote:
> 
> On Fri, Nov 28, 2014 at 5:46 PM, Dan Van Der Ster
> <daniel.vanderster@xxxxxxx> wrote:
>> Hi Andrei,
>> Yes, I’m testing from within the guest.
>> 
>> Here is an example. First, I do 2MB reads when the max_sectors_kb=512, and
>> we see the reads are split into 4. (fio sees 25 iops, though iostat reports
>> 100 smaller iops):
>> 
>> # echo 512 >  /sys/block/vdb/queue/max_sectors_kb  # this is the default
>> # fio --readonly --name /dev/vdb --rw=read --size=1G  --ioengine=libaio
>> --direct=1 --runtime=10s --blocksize=2m
>> /dev/vdb: (g=0): rw=read, bs=2M-2M/2M-2M/2M-2M, ioengine=libaio, iodepth=1
>> fio-2.0.13
>> Starting 1 process
>> Jobs: 1 (f=1): [R] [100.0% done] [51200K/0K/0K /s] [25 /0 /0  iops] [eta
>> 00m:00s]
>> 
>> meanwhile iostat is reporting 100 iops of average size 1024 sectors (i.e.
>> 512kB):
>> 
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
>> avgqu-sz   await  svctm  %util
>> vdb               0.00     0.00  100.00    0.00    50.00     0.00  1024.00
>> 3.02   30.25  10.00 100.00
>> 
>> 
>> 
>> Now increase the max_sectors_kb to 4MB, and the IOs are no longer split:
>> 
>> # echo 4096 >  /sys/block/vdb/queue/max_sectors_kb
>> # fio --readonly --name /dev/vdb --rw=read --size=1G  --ioengine=libaio
>> --direct=1 --runtime=10s --blocksize=2m
>> /dev/vdb: (g=0): rw=read, bs=2M-2M/2M-2M/2M-2M, ioengine=libaio, iodepth=1
>> fio-2.0.13
>> Starting 1 process
>> Jobs: 1 (f=1): [R] [100.0% done] [200.0M/0K/0K /s] [100 /0 /0  iops] [eta
>> 00m:00s]
>> 
>> iostat reports 100 iops, 4096 sectors each read (i.e. 2MB):
>> 
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
>> avgqu-sz   await  svctm  %util
>> vdb             300.00     0.00  100.00    0.00   200.00     0.00  4096.00
>> 0.99    9.94   9.94  99.40
> 
> We set the hard request size limit to rbd object size (4M typically)
> 
>    blk_queue_max_hw_sectors(q, segment_size / SECTOR_SIZE);
> 

Are you referring to librbd or krbd? My observations are limited to librbd at the moment. (I didn’t try this on krbd).

> but block layer then sets the soft limit for fs requests to 512K
> 
>   BLK_DEF_MAX_SECTORS  = 1024,
> 
>   limits->max_sectors = min_t(unsigned int, max_hw_sectors,
>                               BLK_DEF_MAX_SECTORS);
> 
> which you are supposed to change on a per-device basis via sysfs.  We
> could probably raise the soft limit to rbd object size by default as
> well - I don't see any harm in that.
> 

Indeed, this patch which was being targeted for 3.19:

https://lkml.org/lkml/2014/9/6/123

Cheers, Dan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com