Re: krbd splitting large IO's into smaller IO's

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi guys, sorry that I hang on this email, I've four OSD servers with Ubuntu 14.04.1 LTS with 9 osd daemons each, 3TB drive size, and 3 ssd journal drives (each journal holds 3 osd daemons), the kernel version that I'm using is 3.18.3-031803-generic, and ceph version 0.82, I would like to know what would be the 'best' parameters in term of io for my 3TB devices, I've:

scheduler: deadline
max_hw_sectors_kb: 16383
max_sectors_kb: 4096
read_ahead_kb: 128
nr_requests: 128

I'm experience some high io waits on all the OSD servers:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.74    0.00   15.43   64.80    0.00   18.03

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda            1610.40   322.20  374.80   11.00  7940.80  1330.00    48.06     0.08    0.21    0.20    0.44   0.20   7.68
sdb             130.60   322.20   55.00   11.00   742.40  1330.00    62.80     0.02    0.23    0.17    0.51   0.19   1.28
md0               0.00     0.00 2170.80  332.40  8683.20  1329.60     8.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00 2170.80  332.40  8683.20  1329.60     8.00     0.87    0.35    0.21    1.26   0.03   7.84
sdd               0.00     0.00   11.80  384.40  4217.60 33197.60   188.87    75.17  189.72  130.78  191.53   1.88  74.64
sdc               0.00     0.00   18.80  313.40   581.60 33154.40   203.11    78.09  235.08   66.85  245.17   2.16  71.84
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdf               0.00     0.80   78.20  181.40 10400.80 19204.80   228.09    31.75  110.93   43.09  140.18   2.99  77.52
sdg               0.00     0.00    1.60  304.60    51.20 31647.20   207.04    64.05  209.19   73.50  209.90   1.90  58.32
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdi               0.00     0.00    6.60   17.20   159.20  2784.80   247.39     0.27    9.14   12.12    8.00   3.19   7.60
sdk               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdj               0.00     0.00   13.40  120.00   428.80  8487.20   133.67    23.91  203.37   36.18  222.04   2.64  35.28
sdl               0.00     0.80   12.40  524.20  2088.80 40842.40   160.01    93.53  168.27  183.35  167.91   1.64  88.24
sdn               0.00     1.40    4.00  433.80    92.80 35926.40   164.55    88.72  196.29  299.40  195.33   1.71  74.96
sdm               0.00     0.00    0.60  544.60    19.20 40348.00   148.08   118.31  217.00   17.33  217.22   1.67  90.80


Thanks in advance,

Best regards,


German Anders
Storage System Engineer Leader
Despegar | IT Team
office +54 11 4894 3500 x3408
mobile +54 911 3493 7262
mail ganders@xxxxxxxxxxxx

2015-06-10 13:07 GMT-03:00 Ilya Dryomov <idryomov@xxxxxxxxx>:
On Wed, Jun 10, 2015 at 7:04 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
>> > >> -----Original Message-----
>> > >> From: Ilya Dryomov [mailto:idryomov@xxxxxxxxx]
>> > >> Sent: 10 June 2015 14:06
>> > >> To: Nick Fisk
>> > >> Cc: ceph-users
>> > >> Subject: Re: krbd splitting large IO's into smaller
>> > >> IO's
>> > >>
>> > >> On Wed, Jun 10, 2015 at 2:47 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
>> > >> > Hi,
>> > >> >
>> > >> > Using Kernel RBD client with Kernel 4.03 (I have also tried some
>> > >> > older kernels with the same effect) and IO is being split into
>> > >> > smaller IO's which is having a negative impact on performance.
>> > >> >
>> > >> > cat /sys/block/sdc/queue/max_hw_sectors_kb
>> > >> > 4096
>> > >> >
>> > >> > cat /sys/block/rbd0/queue/max_sectors_kb
>> > >> > 4096
>> > >> >
>> > >> > Using DD
>> > >> > dd if=/dev/rbd0 of=/dev/null bs=4M
>> > >> >
>> > >> > Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
>> avgrq-sz
>> > >> > avgqu-sz   await r_await w_await  svctm  %util
>> > >> > rbd0              0.00     0.00  201.50    0.00 25792.00     0.00
>> 256.00
>> > >> > 1.99   10.15   10.15    0.00   4.96 100.00
>> > >> >
>> > >> >
>> > >> > Using FIO with 4M blocks
>> > >> > Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
>> avgrq-sz
>> > >> > avgqu-sz   await r_await w_await  svctm  %util
>> > >> > rbd0              0.00     0.00  232.00    0.00 118784.00     0.00
>> 1024.00
>> > >> > 11.29   48.58   48.58    0.00   4.31 100.00
>> > >> >
>> > >> > Any ideas why IO sizes are limited to 128k (256 blocks) in DD's
>> > >> > case and 512k in Fio's case?
>> > >>
>> > >> 128k vs 512k is probably buffered vs direct IO - add iflag=direct
>> > >> to your dd invocation.
>> > >
>> > > Yes, thanks for this, that was the case
>> > >
>> > >>
>> > >> As for the 512k - I'm pretty sure it's a regression in our switch
>> > >> to blk-mq.  I tested it around 3.18-3.19 and saw steady 4M IOs.  I
>> > >> hope we are just missing a knob - I'll take a look.
>> > >
>> > > I've tested both 4.03 and 3.16 and both seem to be split into 512k.
>> > > Let
>> me
>> > know if you need me to test any other particular version.
>> >
>> > With 3.16 you are going to need to adjust max_hw_sectors_kb /
>> > max_sectors_kb as discussed in Dan's thread.  The patch that fixed
>> > that in the block layer went into 3.19, blk-mq into 4.0 - try 3.19.
>>
>> Sorry should have mentioned, I had adjusted both of them on the 3.16
>> kernel to 4096.
>> I will try 3.19 and let you know.
>
> Better with 3.19, but should I not be seeing around 8192, or am I getting my
> blocks and bytes mixed up?
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> rbd0             72.00     0.00   24.00    0.00 49152.00     0.00  4096.00
> 1.96   82.67   82.67    0.00  41.58  99.80

I'd expect 8192.  I'm getting a box for investigation.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux