Hi Ilya,
Thanks for your explanation. This makes sense. Will you make max_segments to be configurable? Could you pls point me the fix you have made? We might help to test it. Thanks. David Zhang > Date: Fri, 26 Jun 2015 18:21:55 +0300 > Subject: Re: [ceph-users] krbd splitting large IO's into smaller IO's > From: idryomov@xxxxxxxxx > To: zhangz.david@xxxxxxxxxxx > CC: ceph-users@xxxxxxxxxxxxxx > > On Fri, Jun 26, 2015 at 3:17 PM, Z Zhang <zhangz.david@xxxxxxxxxxx> wrote: > > Hi Ilya, > > > > I am seeing your recent email talking about krbd splitting large IO's into > > smaller IO's, see below link. > > > > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg20587.html > > > > I just tried it on my ceph cluster using kernel 3.10.0-1. I adjust both > > max_sectors_kb and max_hw_sectors_kb of rbd device to 4096. > > > > Use fio with 4M block size for read: > > > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > > avgqu-sz await r_await w_await svctm %util > > rbd3 81.00 0.00 135.00 0.00 108.00 0.00 1638.40 > > 2.72 20.15 20.15 0.00 7.41 100.00 > > > > > > Use fio with 1M or 2M block size for read: > > > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > > avgqu-sz await r_await w_await svctm %util > > rbd3 0.00 0.00 213.00 0.00 106.50 0.00 1024.00 > > 2.56 12.02 12.02 0.00 4.69 100.00 > > > > > > Use fio with 4M block size for write: > > > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > > avgqu-sz await r_await w_await svctm %util > > rbd3 0.00 40.00 0.00 40.00 0.00 40.00 2048.00 > > 2.87 70.90 0.00 70.90 24.90 99.60 > > > > > > Use fio with 1M or 2M block size for write: > > > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > > avgqu-sz await r_await w_await svctm %util > > rbd3 0.00 0.00 0.00 80.00 0.00 40.00 1024.00 > > 3.55 48.20 0.00 48.20 12.50 100.00 > > > > > > So why the IO size here is far less than 4096 (If using default value 512, > > all the IO size is 1024)? Is there some other parameters need to adjust, or > > is it about this kernel version? > > It's about this kernel version. Assuming you are doing direct I/Os > with fio, setting max_sectors_kb to 4096 is really the only thing you > can do, and that's enough to *sometimes* see 8192 sector (i.e. 4M) I/Os. > The problem is the max_segments value, which in 3.10 is 128 and which > you cannot adjust via sysfs. > > It all comes down to a memory allocator. To get a 4M I/O, the total > number of segments (physically contiguous chunks of memory) in the > 8 bios (8*512k = 4M) that need to be merged has to be <= 128. When you > are allocated such nice and contiguous bios, you get 4M I/Os. In other > cases you don't. > > This will be fixed in 4.2, along with a bunch of other things. This > particular max_segment fix is a one liner, so we will probably backport > it to older kernels, including 3.10. > > Thanks, > > Ilya |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com