Re: krbd splitting large IO's into smaller IO's

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ilya,

Thanks for your explanation. This makes sense. Will you make max_segments to be configurable? Could you pls point me the fix you have made? We might help to test it.

Thanks.

David Zhang 


> Date: Fri, 26 Jun 2015 18:21:55 +0300
> Subject: Re: [ceph-users] krbd splitting large IO's into smaller IO's
> From: idryomov@xxxxxxxxx
> To: zhangz.david@xxxxxxxxxxx
> CC: ceph-users@xxxxxxxxxxxxxx
>
> On Fri, Jun 26, 2015 at 3:17 PM, Z Zhang <zhangz.david@xxxxxxxxxxx> wrote:
> > Hi Ilya,
> >
> > I am seeing your recent email talking about krbd splitting large IO's into
> > smaller IO's, see below link.
> >
> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg20587.html
> >
> > I just tried it on my ceph cluster using kernel 3.10.0-1. I adjust both
> > max_sectors_kb and max_hw_sectors_kb of rbd device to 4096.
> >
> > Use fio with 4M block size for read:
> >
> > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
> > avgqu-sz await r_await w_await svctm %util
> > rbd3 81.00 0.00 135.00 0.00 108.00 0.00 1638.40
> > 2.72 20.15 20.15 0.00 7.41 100.00
> >
> >
> > Use fio with 1M or 2M block size for read:
> >
> > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
> > avgqu-sz await r_await w_await svctm %util
> > rbd3 0.00 0.00 213.00 0.00 106.50 0.00 1024.00
> > 2.56 12.02 12.02 0.00 4.69 100.00
> >
> >
> > Use fio with 4M block size for write:
> >
> > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
> > avgqu-sz await r_await w_await svctm %util
> > rbd3 0.00 40.00 0.00 40.00 0.00 40.00 2048.00
> > 2.87 70.90 0.00 70.90 24.90 99.60
> >
> >
> > Use fio with 1M or 2M block size for write:
> >
> > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
> > avgqu-sz await r_await w_await svctm %util
> > rbd3 0.00 0.00 0.00 80.00 0.00 40.00 1024.00
> > 3.55 48.20 0.00 48.20 12.50 100.00
> >
> >
> > So why the IO size here is far less than 4096 (If using default value 512,
> > all the IO size is 1024)? Is there some other parameters need to adjust, or
> > is it about this kernel version?
>
> It's about this kernel version. Assuming you are doing direct I/Os
> with fio, setting max_sectors_kb to 4096 is really the only thing you
> can do, and that's enough to *sometimes* see 8192 sector (i.e. 4M) I/Os.
> The problem is the max_segments value, which in 3.10 is 128 and which
> you cannot adjust via sysfs.
>
> It all comes down to a memory allocator. To get a 4M I/O, the total
> number of segments (physically contiguous chunks of memory) in the
> 8 bios (8*512k = 4M) that need to be merged has to be <= 128. When you
> are allocated such nice and contiguous bios, you get 4M I/Os. In other
> cases you don't.
>
> This will be fixed in 4.2, along with a bunch of other things. This
> particular max_segment fix is a one liner, so we will probably backport
> it to older kernels, including 3.10.
>
> Thanks,
>
> Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux