Re: krbd splitting large IO's into smaller IO's

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 26, 2015 at 3:17 PM, Z Zhang <zhangz.david@xxxxxxxxxxx> wrote:
> Hi Ilya,
>
> I am seeing your recent email talking about krbd splitting large IO's into
> smaller IO's, see below link.
>
> https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg20587.html
>
> I just tried it on my ceph cluster using kernel 3.10.0-1. I adjust both
> max_sectors_kb and max_hw_sectors_kb of rbd device to 4096.
>
> Use fio with 4M block size for read:
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> rbd3             81.00     0.00  135.00    0.00   108.00     0.00  1638.40
> 2.72   20.15   20.15    0.00   7.41 100.00
>
>
> Use fio with 1M or 2M block size for read:
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> rbd3              0.00     0.00  213.00    0.00   106.50     0.00  1024.00
> 2.56   12.02   12.02    0.00   4.69 100.00
>
>
> Use fio with 4M block size for write:
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> rbd3              0.00    40.00    0.00   40.00     0.00    40.00  2048.00
> 2.87   70.90    0.00   70.90  24.90  99.60
>
>
> Use fio with 1M or 2M block size for write:
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> rbd3              0.00     0.00    0.00   80.00     0.00    40.00  1024.00
> 3.55   48.20    0.00   48.20  12.50 100.00
>
>
> So why the IO size here is far less than 4096 (If using default value 512,
> all the IO size is 1024)? Is there some other parameters need to adjust, or
> is it about this kernel version?

It's about this kernel version.  Assuming you are doing direct I/Os
with fio, setting max_sectors_kb to 4096 is really the only thing you
can do, and that's enough to *sometimes* see 8192 sector (i.e. 4M) I/Os.
The problem is the max_segments value, which in 3.10 is 128 and which
you cannot adjust via sysfs.

It all comes down to a memory allocator.  To get a 4M I/O, the total
number of segments (physically contiguous chunks of memory) in the
8 bios (8*512k = 4M) that need to be merged has to be <= 128.  When you
are allocated such nice and contiguous bios, you get 4M I/Os.  In other
cases you don't.

This will be fixed in 4.2, along with a bunch of other things.  This
particular max_segment fix is a one liner, so we will probably backport
it to older kernels, including 3.10.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux