I did some additional testing - I stopped Kafka on the host, and kicked off a disk check, and it ran at the expected speed overnight. I started kafka this morning, and the raid check's speed immediately dropped down to ~2000K/Sec. I then enabled the write-back cache on the drives (hdparm -W1 /dev/sd*). The raid check is now running between 100000K/Sec and 200000K/Sec, and has been for several hours (it fluctuates, but seems to stay within that range). Write-back cache is NOT enabled for the drives on the hosts we haven't upgraded yet, but the speeds are similar (I kicked off a raid check on one of our CentOS 6 hosts as well, the window seems to be 150000 - 200000K/Sec on that host). Kelly On 2016-05-27, 9:21 AM, "Kelly Lesperance" <klesperance@xxxxxxxxxxxxxx> wrote: >All of our Kafka clusters are fairly write-heavy. The cluster in question is our second-heaviest – we haven’t yet upgraded the heaviest, due to the issues we’ve been experiencing in this one. > >Here is an iostat example from a host within the same cluster, but without the RAID check running: > >[root@r2k1 ~] # iostat -xdmc 1 10 >Linux 3.10.0-327.13.1.el7.x86_64 (r2k1) 05/27/16 _x86_64_ (32 CPU) > >avg-cpu: %user %nice %system %iowait %steal %idle > 8.87 0.02 1.28 0.21 0.00 89.62 > >Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util >sdd 0.02 0.55 0.15 27.06 0.03 11.40 859.89 1.02 37.40 36.13 37.41 6.86 18.65 >sdf 0.02 0.48 0.15 26.99 0.03 11.40 862.17 0.15 5.56 40.94 5.37 7.27 19.73 >sdk 0.03 0.58 0.22 27.10 0.03 11.40 857.01 1.60 58.49 36.20 58.67 7.17 19.58 >sdb 0.02 0.52 0.15 27.43 0.03 11.40 848.37 0.02 0.78 42.84 0.55 7.07 19.50 >sdj 0.02 0.55 0.15 27.11 0.03 11.40 858.28 0.62 22.70 41.97 22.59 7.43 20.27 >sdg 0.03 0.68 0.22 27.76 0.03 11.40 836.98 0.76 27.10 34.36 27.04 7.33 20.51 >sde 0.03 0.48 0.22 26.99 0.03 11.40 860.43 0.33 12.07 33.16 11.90 7.34 19.98 >sda 0.03 0.52 0.22 27.43 0.03 11.40 846.65 0.57 20.48 36.42 20.35 7.34 20.31 >sdh 0.02 0.68 0.15 27.76 0.03 11.40 838.63 0.47 16.66 40.96 16.53 7.20 20.09 >sdc 0.03 0.55 0.22 27.06 0.03 11.40 858.19 0.74 27.30 36.96 27.22 7.55 20.58 >sdi 0.03 0.53 0.22 27.13 0.03 11.40 856.04 1.60 58.50 27.43 58.75 5.21 14.24 >sdl 0.02 0.56 0.15 27.11 0.03 11.40 858.27 1.12 41.09 27.89 41.16 5.00 13.63 >md127 0.00 0.00 2.53 161.84 0.36 68.39 856.56 0.00 0.00 0.00 0.00 0.00 0.00 > >avg-cpu: %user %nice %system %iowait %steal %idle > 13.11 0.00 1.82 1.07 0.00 84.01 > >Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util >sdd 0.00 0.00 0.00 81.00 0.00 38.48 972.95 51.00 219.06 0.00 219.06 6.37 51.60 >sdf 0.00 1.00 0.00 73.00 0.00 33.70 945.33 55.02 235.86 0.00 235.86 7.12 52.00 >sdk 0.00 1.00 0.00 56.00 0.00 25.70 939.73 60.45 223.79 0.00 223.79 9.29 52.00 >sdb 0.00 2.00 0.00 70.00 0.00 34.48 1008.70 58.88 292.81 0.00 292.81 7.37 51.60 >sdj 0.00 3.00 0.00 62.00 0.00 29.87 986.60 59.32 243.48 0.00 243.48 8.26 51.20 >sdg 0.00 1.00 0.00 49.00 0.00 23.43 979.45 60.37 234.98 0.00 234.98 10.53 51.60 >sde 0.00 1.00 0.00 61.00 0.00 27.95 938.38 58.17 239.57 0.00 239.57 8.52 52.00 >sda 0.00 2.00 0.00 56.00 0.00 27.48 1004.88 56.27 202.88 0.00 202.88 9.27 51.90 >sdh 0.00 1.00 0.00 70.00 0.00 33.57 982.19 59.00 277.84 0.00 277.84 7.43 52.00 >sdc 0.00 0.00 0.00 64.00 0.00 30.06 961.89 58.20 268.30 0.00 268.30 8.08 51.70 >sdi 0.00 3.00 0.00 116.00 0.00 55.62 981.94 44.54 199.72 0.00 199.72 4.56 52.90 >sdl 0.00 1.00 0.00 128.00 0.00 60.31 964.88 43.91 215.94 0.00 215.94 4.11 52.60 >md127 0.00 0.00 0.00 1143.00 0.00 538.90 965.59 0.00 0.00 0.00 0.00 0.00 0.00 > >avg-cpu: %user %nice %system %iowait %steal %idle > 15.70 0.00 1.97 0.44 0.00 81.89 > >Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util >sdd 0.00 0.00 0.00 119.00 0.00 56.39 970.42 42.84 639.45 0.00 639.45 6.66 79.20 >sdf 0.00 1.00 0.00 129.00 0.00 61.21 971.84 48.89 672.04 0.00 672.04 6.34 81.80 >sdk 0.00 0.00 0.00 152.00 0.00 72.62 978.53 61.02 716.76 0.00 716.76 5.74 87.20 >sdb 0.00 1.00 0.00 133.00 0.00 62.86 967.88 54.10 695.35 0.00 695.35 6.45 85.80 >sdj 0.00 0.00 0.00 146.00 0.00 68.36 958.85 69.22 767.12 0.00 767.12 6.85 100.00 >sdg 0.00 0.00 0.00 146.00 0.00 69.87 980.11 77.99 789.53 0.00 789.53 6.85 100.00 >sde 0.00 1.00 0.00 141.00 0.00 66.96 972.60 56.21 707.61 0.00 707.61 6.21 87.60 >sda 0.00 1.00 0.00 147.00 0.00 69.86 973.22 62.21 728.76 0.00 728.76 6.32 92.90 >sdh 0.00 0.00 0.00 134.00 0.00 62.61 956.90 55.79 711.49 0.00 711.49 6.63 88.90 >sdc 0.00 0.00 0.00 136.00 0.00 64.81 975.94 61.46 753.57 0.00 753.57 6.93 94.20 >sdi 0.00 0.00 0.00 93.00 0.00 42.67 939.61 17.60 419.10 0.00 419.10 4.63 43.10 >sdl 0.00 0.00 0.00 80.00 0.00 38.02 973.20 11.00 340.79 0.00 340.79 4.25 34.00 >md127 0.00 0.00 0.00 87.00 0.00 40.99 964.97 0.00 0.00 0.00 0.00 0.00 0.00 > >avg-cpu: %user %nice %system %iowait %steal %idle > 12.11 0.00 1.35 0.00 0.00 86.54 > >Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util >sdd 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.01 15.00 0.00 15.00 15.00 1.50 >sdf 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.01 11.00 0.00 11.00 11.00 1.10 >sdk 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.01 11.00 0.00 11.00 11.00 1.10 >sdb 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.01 7.00 0.00 7.00 7.00 0.70 >sdj 0.00 0.00 0.00 2.00 0.00 0.06 64.50 0.01 733.50 0.00 733.50 7.50 1.50 >sdg 0.00 0.00 0.00 10.00 0.00 2.88 588.90 0.55 1212.80 0.00 1212.80 15.50 15.50 >sde 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.01 12.00 0.00 12.00 12.00 1.20 >sda 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.01 11.00 0.00 11.00 11.00 1.10 >sdh 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.02 20.00 0.00 20.00 20.00 2.00 >sdc 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.02 17.00 0.00 17.00 17.00 1.70 >sdi 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.01 12.00 0.00 12.00 12.00 1.20 >sdl 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.02 17.00 0.00 17.00 17.00 1.70 >md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > >avg-cpu: %user %nice %system %iowait %steal %idle > 15.22 0.00 1.50 0.00 0.00 83.28 > >Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util >sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdk 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdj 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdl 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > >avg-cpu: %user %nice %system %iowait %steal %idle > 16.96 0.09 1.63 0.16 0.00 81.16 > >Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util >sdd 0.00 0.00 0.00 8.00 0.00 0.66 168.25 0.09 11.50 0.00 11.50 8.75 7.00 >sdf 0.00 0.00 0.00 5.00 0.00 0.52 213.20 0.08 16.20 0.00 16.20 16.20 8.10 >sdk 0.00 0.00 0.00 3.00 0.00 0.50 342.00 0.06 20.33 0.00 20.33 20.33 6.10 >sdb 0.00 0.00 0.00 3.00 0.00 0.50 342.00 0.05 16.67 0.00 16.67 16.67 5.00 >sdj 0.00 0.00 0.00 4.00 0.00 0.98 500.50 0.06 14.50 0.00 14.50 11.00 4.40 >sdg 0.00 1.00 0.00 4.00 0.00 0.63 322.50 0.14 36.00 0.00 36.00 32.75 13.10 >sde 0.00 0.00 0.00 5.00 0.00 0.52 213.20 0.07 13.60 0.00 13.60 13.60 6.80 >sda 0.00 0.00 0.00 3.00 0.00 0.50 342.00 0.05 15.67 0.00 15.67 15.67 4.70 >sdh 0.00 1.00 0.00 4.00 0.00 0.63 322.50 0.06 14.50 0.00 14.50 11.50 4.60 >sdc 0.00 0.00 0.00 8.00 0.00 0.66 168.25 0.11 13.25 0.00 13.25 10.62 8.50 >sdi 0.00 0.00 0.00 4.00 0.00 0.98 500.50 0.06 15.50 0.00 15.50 12.00 4.80 >sdl 0.00 0.00 0.00 3.00 0.00 0.50 342.00 0.04 13.67 0.00 13.67 13.67 4.10 >md127 0.00 0.00 0.00 17.00 0.00 3.78 455.53 0.00 0.00 0.00 0.00 0.00 0.00 > >avg-cpu: %user %nice %system %iowait %steal %idle > 14.08 0.00 1.50 0.00 0.00 84.42 > >Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util >sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdk 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdj 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdl 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > >avg-cpu: %user %nice %system %iowait %steal %idle > 14.89 0.00 1.98 0.00 0.00 83.13 > >Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util >sdd 0.00 0.00 0.00 90.00 0.00 41.31 940.01 27.25 302.80 0.00 302.80 7.07 63.60 >sdf 0.00 0.00 0.00 87.00 0.00 41.35 973.44 22.73 261.30 0.00 261.30 6.92 60.20 >sdk 0.00 2.00 0.00 97.00 0.00 42.08 888.42 39.86 410.94 0.00 410.94 8.10 78.60 >sdb 0.00 0.00 0.00 87.00 0.00 41.07 966.82 24.39 280.30 0.00 280.30 7.14 62.10 >sdj 0.00 1.00 0.00 91.00 0.00 41.94 943.92 36.37 399.62 0.00 399.62 8.44 76.80 >sdg 0.00 0.00 0.00 86.00 0.00 40.67 968.48 31.76 369.33 0.00 369.33 8.81 75.80 >sde 0.00 0.00 0.00 87.00 0.00 41.35 973.44 30.80 354.05 0.00 354.05 9.01 78.40 >sda 0.00 0.00 0.00 87.00 0.00 41.07 966.82 32.61 374.80 0.00 374.80 8.57 74.60 >sdh 0.00 0.00 0.00 86.00 0.00 40.67 968.48 29.52 343.23 0.00 343.23 8.56 73.60 >sdc 0.00 0.00 0.00 89.00 0.00 40.81 939.07 32.80 360.15 0.00 360.15 8.91 79.30 >sdi 0.00 1.00 0.00 91.00 0.00 41.94 943.92 19.60 215.34 0.00 215.34 5.62 51.10 >sdl 0.00 2.00 0.00 97.00 0.00 42.08 888.42 19.59 201.93 0.00 201.93 4.69 45.50 >md127 0.00 0.00 0.00 535.00 0.00 248.42 950.95 0.00 0.00 0.00 0.00 0.00 0.00 > >avg-cpu: %user %nice %system %iowait %steal %idle > 11.08 0.00 1.41 0.00 0.00 87.51 > >Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util >sdd 0.00 5.00 0.00 42.00 0.00 0.38 18.55 2.25 53.52 0.00 53.52 4.93 20.70 >sdf 0.00 0.00 0.00 35.00 0.00 0.21 12.43 1.62 46.17 0.00 46.17 5.29 18.50 >sdk 0.00 23.00 0.00 42.00 0.00 0.44 21.40 1.99 47.29 0.00 47.29 4.64 19.50 >sdb 0.00 9.00 0.00 58.00 0.00 0.34 12.02 2.77 47.78 0.00 47.78 4.12 23.90 >sdj 0.00 1.00 0.00 39.00 0.00 0.24 12.79 1.79 45.97 0.00 45.97 5.21 20.30 >sdg 0.00 11.00 0.00 66.00 0.00 0.40 12.45 3.60 54.47 0.00 54.47 3.42 22.60 >sde 0.00 0.00 0.00 35.00 0.00 0.21 12.43 2.13 61.00 0.00 61.00 8.89 31.10 >sda 0.00 9.00 0.00 58.00 0.00 0.34 12.02 2.48 42.81 0.00 42.81 3.71 21.50 >sdh 0.00 11.00 0.00 66.00 0.00 0.40 12.45 4.81 72.83 0.00 72.83 3.80 25.10 >sdc 0.00 5.00 0.00 43.00 0.00 0.88 41.93 1.99 63.81 0.00 63.81 5.00 21.50 >sdi 0.00 1.00 0.00 39.00 0.00 0.24 12.79 1.31 33.69 0.00 33.69 4.03 15.70 >sdl 0.00 23.00 0.00 42.00 0.00 0.44 21.40 1.23 29.33 0.00 29.33 3.71 15.60 >md127 0.00 0.00 0.00 313.00 0.00 2.01 13.14 0.00 0.00 0.00 0.00 0.00 0.00 > >avg-cpu: %user %nice %system %iowait %steal %idle > 16.16 0.03 1.66 0.00 0.00 82.15 > >Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util >sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdk 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdj 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >sdl 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > >On 2016-05-26, 11:50 PM, "centos-bounces@xxxxxxxxxx on behalf of Gordon Messmer" <centos-bounces@xxxxxxxxxx on behalf of gordon.messmer@xxxxxxxxx> wrote: > >>On 05/25/2016 09:54 AM, Kelly Lesperance wrote: >>> What we're seeing is that when the weekly raid-check script executes, performance nose dives, and I/O wait skyrockets. The raid check starts out fairly fast (20000K/sec - the limit that's been set), but then quickly drops down to about 4000K/Sec. dev.raid.speed sysctls are at the defaults: >> >>It looks like some pretty heavy writes are going on at the time. I'm not >>sure what you mean by "nose dives", but I'd expect *some* performance >>impact of running a read-intensive process like a RAID check at the same >>time you're running a write-intensive process. >> >>Do the same write-heavy processes run on the other clusters, where you >>aren't seeing performance issues? >> >>> avg-cpu: %user %nice %system %iowait %steal %idle >>> 9.24 0.00 1.32 20.02 0.00 69.42 >>> >>> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn >>> sda 50.00 512.00 20408.00 512 20408 >>> sdb 50.00 512.00 20408.00 512 20408 >>> sdc 48.00 512.00 19984.00 512 19984 >>> sdd 48.00 512.00 19984.00 512 19984 >>> sdf 50.00 704.00 19968.00 704 19968 >>> sdg 47.00 512.00 19968.00 512 19968 >>> sdh 47.00 512.00 19968.00 512 19968 >>> sde 50.00 704.00 19968.00 704 19968 >>> sdj 48.00 512.00 19972.00 512 19972 >>> sdi 48.00 512.00 19972.00 512 19972 >>> sdk 48.00 512.00 19980.00 512 19980 >>> sdl 48.00 512.00 19980.00 512 19980 >>> md127 241.00 0.00 120280.00 0 120280 >> >>_______________________________________________ >>CentOS mailing list >>CentOS@xxxxxxxxxx >>https://lists.centos.org/mailman/listinfo/centos > _______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx https://lists.centos.org/mailman/listinfo/centos