Re: large reads become 512 kbyte reads on qemu-kvm rbd

Ilya Dryomov <ilya.dryomov@xxxxxxxxxxx> · Fri, 28 Nov 2014 20:56:24 +0400

On Fri, Nov 28, 2014 at 5:46 PM, Dan Van Der Ster
<daniel.vanderster@xxxxxxx> wrote:
> Hi Andrei,
> Yes, I’m testing from within the guest.
>
> Here is an example. First, I do 2MB reads when the max_sectors_kb=512, and
> we see the reads are split into 4. (fio sees 25 iops, though iostat reports
> 100 smaller iops):
>
> # echo 512 >  /sys/block/vdb/queue/max_sectors_kb  # this is the default
> # fio --readonly --name /dev/vdb --rw=read --size=1G  --ioengine=libaio
> --direct=1 --runtime=10s --blocksize=2m
> /dev/vdb: (g=0): rw=read, bs=2M-2M/2M-2M/2M-2M, ioengine=libaio, iodepth=1
> fio-2.0.13
> Starting 1 process
> Jobs: 1 (f=1): [R] [100.0% done] [51200K/0K/0K /s] [25 /0 /0  iops] [eta
> 00m:00s]
>
> meanwhile iostat is reporting 100 iops of average size 1024 sectors (i.e.
> 512kB):
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> vdb               0.00     0.00  100.00    0.00    50.00     0.00  1024.00
> 3.02   30.25  10.00 100.00
>
>
>
> Now increase the max_sectors_kb to 4MB, and the IOs are no longer split:
>
> # echo 4096 >  /sys/block/vdb/queue/max_sectors_kb
> # fio --readonly --name /dev/vdb --rw=read --size=1G  --ioengine=libaio
> --direct=1 --runtime=10s --blocksize=2m
> /dev/vdb: (g=0): rw=read, bs=2M-2M/2M-2M/2M-2M, ioengine=libaio, iodepth=1
> fio-2.0.13
> Starting 1 process
> Jobs: 1 (f=1): [R] [100.0% done] [200.0M/0K/0K /s] [100 /0 /0  iops] [eta
> 00m:00s]
>
> iostat reports 100 iops, 4096 sectors each read (i.e. 2MB):
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> vdb             300.00     0.00  100.00    0.00   200.00     0.00  4096.00
> 0.99    9.94   9.94  99.40

We set the hard request size limit to rbd object size (4M typically)

    blk_queue_max_hw_sectors(q, segment_size / SECTOR_SIZE);

but block layer then sets the soft limit for fs requests to 512K

   BLK_DEF_MAX_SECTORS  = 1024,

   limits->max_sectors = min_t(unsigned int, max_hw_sectors,
                               BLK_DEF_MAX_SECTORS);

which you are supposed to change on a per-device basis via sysfs.  We
could probably raise the soft limit to rbd object size by default as
well - I don't see any harm in that.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com