Re: large reads become 512 kbyte reads on qemu-kvm rbd

Ilya Dryomov <ilya.dryomov@xxxxxxxxxxx> · Mon, 1 Dec 2014 13:59:04 +0300

On Mon, Dec 1, 2014 at 1:09 PM, Dan Van Der Ster
<daniel.vanderster@xxxxxxx> wrote:
> Hi Ilya,
>
>> On 28 Nov 2014, at 17:56, Ilya Dryomov <ilya.dryomov@xxxxxxxxxxx> wrote:
>>
>> On Fri, Nov 28, 2014 at 5:46 PM, Dan Van Der Ster
>> <daniel.vanderster@xxxxxxx> wrote:
>>> Hi Andrei,
>>> Yes, I’m testing from within the guest.
>>>
>>> Here is an example. First, I do 2MB reads when the max_sectors_kb=512, and
>>> we see the reads are split into 4. (fio sees 25 iops, though iostat reports
>>> 100 smaller iops):
>>>
>>> # echo 512 >  /sys/block/vdb/queue/max_sectors_kb  # this is the default
>>> # fio --readonly --name /dev/vdb --rw=read --size=1G  --ioengine=libaio
>>> --direct=1 --runtime=10s --blocksize=2m
>>> /dev/vdb: (g=0): rw=read, bs=2M-2M/2M-2M/2M-2M, ioengine=libaio, iodepth=1
>>> fio-2.0.13
>>> Starting 1 process
>>> Jobs: 1 (f=1): [R] [100.0% done] [51200K/0K/0K /s] [25 /0 /0  iops] [eta
>>> 00m:00s]
>>>
>>> meanwhile iostat is reporting 100 iops of average size 1024 sectors (i.e.
>>> 512kB):
>>>
>>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
>>> avgqu-sz   await  svctm  %util
>>> vdb               0.00     0.00  100.00    0.00    50.00     0.00  1024.00
>>> 3.02   30.25  10.00 100.00
>>>
>>>
>>>
>>> Now increase the max_sectors_kb to 4MB, and the IOs are no longer split:
>>>
>>> # echo 4096 >  /sys/block/vdb/queue/max_sectors_kb
>>> # fio --readonly --name /dev/vdb --rw=read --size=1G  --ioengine=libaio
>>> --direct=1 --runtime=10s --blocksize=2m
>>> /dev/vdb: (g=0): rw=read, bs=2M-2M/2M-2M/2M-2M, ioengine=libaio, iodepth=1
>>> fio-2.0.13
>>> Starting 1 process
>>> Jobs: 1 (f=1): [R] [100.0% done] [200.0M/0K/0K /s] [100 /0 /0  iops] [eta
>>> 00m:00s]
>>>
>>> iostat reports 100 iops, 4096 sectors each read (i.e. 2MB):
>>>
>>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
>>> avgqu-sz   await  svctm  %util
>>> vdb             300.00     0.00  100.00    0.00   200.00     0.00  4096.00
>>> 0.99    9.94   9.94  99.40
>>
>> We set the hard request size limit to rbd object size (4M typically)
>>
>>    blk_queue_max_hw_sectors(q, segment_size / SECTOR_SIZE);
>>
>
> Are you referring to librbd or krbd? My observations are limited to librbd at the moment. (I didn’t try this on krbd).

Yes, I was referring to krbd.  But it looks like that patch from
Christoph will change this for qemu+librbd as well - an artificial soft
limit imposed by the VM kernel will disappear.  CC'ing Josh.

>
>> but block layer then sets the soft limit for fs requests to 512K
>>
>>   BLK_DEF_MAX_SECTORS  = 1024,
>>
>>   limits->max_sectors = min_t(unsigned int, max_hw_sectors,
>>                               BLK_DEF_MAX_SECTORS);
>>
>> which you are supposed to change on a per-device basis via sysfs.  We
>> could probably raise the soft limit to rbd object size by default as
>> well - I don't see any harm in that.
>>
>
> Indeed, this patch which was being targeted for 3.19:
>
> https://lkml.org/lkml/2014/9/6/123

Oh good, I was just about to send a patch for krbd.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com