On 02/03/2012 05:03 PM, Yehuda Sadeh Weinraub wrote:
On Fri, Feb 3, 2012 at 3:33 PM, Jim Schutt<jaschut@xxxxxxxxxx> wrote:
You can try running 'iostat -t -kx -d 1' on the osds, and see whether %util reaches 100%, and when it happens whether it's due to number of io operations that are thrashing, or whether it's due to high amount of data. FWIW, you may try setting 'filestore flusher = false', and set /proc/sys/vm/dirty_background_ratio' to a small number (e.g., 1M).
Here's some iostat data from early in a run, when things are running well: 02/02/2012 09:14:13 AM avg-cpu: %user %nice %system %iowait %steal %idle 23.24 0.00 61.99 7.38 0.00 7.38 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sdc 0.00 0.00 0.00 206.00 0.00 101.57 1009.79 54.80 251.27 4.86 100.10 sdd 0.00 0.00 0.00 202.00 0.00 98.10 994.61 27.85 132.42 4.96 100.10 sde 0.00 4.00 0.00 212.00 0.00 105.09 1015.25 96.06 588.43 4.72 100.10 sdh 0.00 0.00 0.00 200.00 0.00 97.11 994.40 69.77 535.01 5.00 100.10 sdg 0.00 2.00 0.00 221.00 0.00 109.59 1015.60 82.05 298.71 4.53 100.10 sda 0.00 1.00 0.00 212.00 0.00 83.93 810.75 18.26 84.82 4.68 99.30 sdf 0.00 0.00 0.00 208.00 0.00 102.55 1009.73 77.23 383.19 4.50 93.70 sdb 0.00 0.00 0.00 205.00 0.00 98.66 985.68 19.97 133.98 4.84 99.20 sdj 0.00 0.00 0.00 202.00 0.00 99.59 1009.66 69.97 257.47 4.95 100.00 sdk 0.00 0.00 0.00 204.00 0.00 98.10 984.86 20.83 100.34 4.87 99.30 sdm 0.00 0.00 0.00 216.00 0.00 106.55 1010.22 77.73 268.67 4.63 100.00 sdn 0.00 0.00 0.00 205.00 0.00 98.60 985.05 19.33 95.88 4.81 98.60 sdo 0.00 0.00 0.00 232.00 0.00 106.25 937.93 23.26 82.19 4.29 99.50 sdl 0.00 0.00 0.00 181.00 0.00 85.12 963.09 24.73 131.71 4.80 86.80 sdp 0.00 4.00 0.00 207.00 0.00 87.41 864.77 37.01 111.13 4.49 93.00 sdi 0.00 0.00 0.00 208.00 0.00 103.04 1014.54 72.30 263.72 4.70 97.70 sdr 0.00 0.00 0.00 191.00 0.00 76.75 822.95 11.51 83.69 4.59 87.60 sds 0.00 0.00 0.00 209.00 0.00 101.91 998.58 49.95 278.08 4.70 98.20 sdt 0.00 0.00 0.00 209.00 0.00 99.57 975.69 27.31 157.44 4.79 100.10 sdu 0.00 0.00 0.00 216.00 0.00 107.09 1015.41 79.82 345.88 4.63 100.10 sdw 0.00 0.00 0.00 208.00 0.00 103.09 1015.00 74.55 308.15 4.81 100.10 sdv 0.00 0.00 0.00 201.00 0.00 98.05 999.08 76.87 265.88 4.98 100.10 sdx 0.00 0.00 0.00 202.00 0.00 100.50 1018.93 110.40 327.68 4.96 100.10 sdq 0.00 0.00 0.00 228.00 0.00 112.59 1011.30 54.84 281.04 4.39 100.10 02/02/2012 09:14:14 AM avg-cpu: %user %nice %system %iowait %steal %idle 22.11 0.00 54.03 15.38 0.00 8.48 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sdc 0.00 0.00 0.00 233.00 0.00 99.68 876.15 95.98 384.42 4.29 100.00 sdd 0.00 0.00 0.00 205.00 0.00 96.64 965.46 20.37 108.51 4.84 99.30 sde 0.00 0.00 0.00 225.00 0.00 99.54 906.03 92.38 420.67 4.44 100.00 sdh 0.00 0.00 0.00 198.00 0.00 97.05 1003.84 79.39 410.56 5.05 100.00 sdg 0.00 0.00 0.00 245.00 0.00 108.38 905.99 84.40 385.47 4.08 100.00 sda 0.00 4.00 0.00 220.00 0.00 96.23 895.78 63.24 294.59 4.44 97.60 sdf 0.00 0.00 0.00 216.00 0.00 107.09 1015.41 87.67 399.14 4.57 98.80 sdb 0.00 0.00 0.00 156.00 0.00 72.05 945.95 11.61 58.94 4.84 75.50 sdj 0.00 0.00 0.00 199.00 0.00 95.41 981.95 56.28 366.11 4.84 96.40 sdk 0.00 0.00 0.00 206.00 0.00 100.14 995.57 54.69 241.41 4.86 100.10 sdm 0.00 0.00 0.00 200.00 0.00 99.09 1014.72 79.51 506.47 4.74 94.70 sdn 0.00 0.00 0.00 191.00 0.00 91.29 978.81 26.82 128.39 5.18 98.90 sdo 0.00 0.00 0.00 234.00 0.00 106.75 934.32 49.82 231.07 4.27 100.00 sdl 0.00 0.00 0.00 214.00 0.00 103.62 991.70 33.03 168.13 4.62 98.80 sdp 0.00 0.00 0.00 219.00 0.00 106.08 992.00 64.69 328.92 4.57 100.00 sdi 0.00 0.00 0.00 210.00 0.00 104.09 1015.09 100.98 421.01 4.76 100.00 sdr 0.00 0.00 0.00 180.00 0.00 81.66 929.07 10.31 63.59 5.12 92.20 sds 0.00 0.00 0.00 201.00 0.00 95.15 969.47 32.60 144.16 4.98 100.00 sdt 0.00 0.00 0.00 198.00 0.00 95.72 990.10 33.26 155.98 4.84 95.90 sdu 0.00 0.00 0.00 219.00 0.00 108.59 1015.53 66.10 347.91 4.57 100.00 sdw 0.00 0.00 0.00 204.00 0.00 100.75 1011.41 81.20 456.47 4.80 98.00 sdv 0.00 0.00 0.00 197.00 0.00 96.09 998.90 44.19 284.65 5.08 100.00 sdx 0.00 0.00 0.00 211.00 0.00 104.19 1011.26 84.87 542.85 4.69 99.00 sdq 0.00 0.00 0.00 216.00 0.00 105.10 996.52 36.63 134.40 4.63 100.00 This is later in the same run, when things are not going as well: 02/02/2012 09:21:52 AM avg-cpu: %user %nice %system %iowait %steal %idle 5.13 0.00 13.31 8.52 0.00 73.04 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sdc 0.00 0.00 0.00 36.00 0.00 16.02 911.11 1.43 39.72 5.64 20.30 sdd 0.00 0.00 0.00 18.00 0.00 8.01 911.11 0.85 47.28 6.39 11.50 sde 0.00 0.00 0.00 4.00 0.00 0.01 6.00 0.08 20.00 13.00 5.20 sdh 0.00 0.00 0.00 20.00 0.00 8.01 820.40 0.65 32.40 5.30 10.60 sdg 0.00 0.00 0.00 19.00 0.00 8.01 863.58 0.60 31.63 4.63 8.80 sda 0.00 0.00 0.00 82.00 0.00 36.04 900.10 3.13 37.05 5.15 42.20 sdf 0.00 0.00 0.00 18.00 0.00 8.01 911.11 0.80 44.22 6.39 11.50 sdb 0.00 8.00 0.00 42.00 0.00 1.75 85.52 0.14 3.43 1.40 5.90 sdj 0.00 16.00 0.00 103.00 0.00 25.64 509.83 2.21 21.36 3.65 37.60 sdk 0.00 14.00 0.00 152.00 0.00 47.93 645.79 3.96 27.31 4.12 62.60 sdm 0.00 0.00 0.00 21.00 0.00 9.39 915.81 0.94 44.57 5.71 12.00 sdn 0.00 34.00 0.00 197.00 0.00 64.61 671.72 28.66 85.62 4.02 79.10 sdo 0.00 0.00 0.00 92.00 0.00 42.54 946.87 6.22 55.58 4.85 44.60 sdl 0.00 0.00 0.00 6.00 0.00 2.01 685.33 0.09 59.67 6.33 3.80 sdp 0.00 10.00 0.00 58.00 0.00 9.56 337.52 1.20 20.60 3.05 17.70 sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdr 0.00 0.00 0.00 37.00 0.00 16.02 886.92 1.19 32.27 5.11 18.90 sds 0.00 18.00 0.00 115.00 0.00 26.54 472.70 4.03 25.94 3.20 36.80 sdt 0.00 0.00 0.00 131.00 0.00 60.05 938.87 6.13 46.33 5.11 67.00 sdu 0.00 12.00 0.00 119.00 0.00 31.40 540.44 2.93 24.65 3.05 36.30 sdw 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdv 0.00 4.00 0.00 63.00 0.00 9.46 307.68 0.83 14.32 2.38 15.00 sdx 0.00 0.00 0.00 35.00 0.00 15.51 907.66 0.79 28.20 4.89 17.10 sdq 0.00 0.00 0.00 37.00 0.00 16.02 886.70 1.52 41.00 5.86 21.70 02/02/2012 09:21:53 AM avg-cpu: %user %nice %system %iowait %steal %idle 3.74 0.00 8.75 6.60 0.00 80.90 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdg 0.00 0.00 0.00 18.00 0.00 8.01 911.11 0.88 48.94 6.83 12.30 sda 0.00 0.00 0.00 45.00 0.00 7.38 335.64 0.54 18.87 1.78 8.00 sdf 0.00 0.00 0.00 18.00 0.00 8.01 911.11 0.93 51.44 6.78 12.20 sdb 0.00 0.00 0.00 5.00 0.00 0.74 302.40 0.05 10.20 8.20 4.10 sdj 0.00 0.00 0.00 72.00 0.00 32.03 911.11 2.51 34.99 5.01 36.10 sdk 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdm 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdn 0.00 0.00 0.00 123.00 0.00 52.60 875.84 13.83 209.72 4.84 59.50 sdo 0.00 0.00 0.00 13.00 0.00 5.52 868.92 0.30 108.31 4.69 6.10 sdl 0.00 0.00 0.00 27.00 0.00 12.47 945.78 1.33 47.15 6.59 17.80 sdp 0.00 0.00 0.00 11.00 0.00 4.50 838.55 0.51 14.09 5.09 5.60 sdi 0.00 0.00 0.00 19.00 0.00 8.01 863.58 0.72 38.05 5.74 10.90 sdr 0.00 0.00 0.00 18.00 0.00 8.01 911.11 0.69 38.33 5.89 10.60 sds 0.00 0.00 0.00 56.00 0.00 19.66 718.86 1.31 39.16 5.11 28.60 sdt 0.00 0.00 0.00 161.00 0.00 72.57 923.18 6.97 37.39 5.07 81.70 sdu 0.00 0.00 0.00 66.00 0.00 30.02 931.64 2.77 39.85 5.09 33.60 sdw 0.00 0.00 0.00 20.00 0.00 8.51 871.60 1.47 27.80 4.85 9.70 sdv 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdx 0.00 0.00 0.00 36.00 0.00 16.02 911.11 1.37 38.08 5.72 20.60 sdq 0.00 0.00 0.00 44.00 0.00 19.46 906.00 1.15 26.02 4.50 19.80 And finally, this is still later, near the end of the run, when things have recovered somewhat: 02/02/2012 09:22:34 AM avg-cpu: %user %nice %system %iowait %steal %idle 15.25 0.00 52.27 20.88 0.00 11.60 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sdc 0.00 1.00 0.00 217.00 0.00 95.20 898.51 84.43 413.56 4.60 99.90 sdd 0.00 0.00 0.00 40.00 0.00 16.86 863.00 1.59 28.45 5.55 22.20 sde 0.00 0.00 0.00 206.00 0.00 99.27 986.95 89.64 452.92 4.85 99.90 sdh 0.00 0.00 0.00 51.00 0.00 22.53 904.63 2.02 35.45 5.47 27.90 sdg 0.00 0.00 0.00 230.00 0.00 112.49 1001.63 92.87 283.01 4.33 99.60 sda 0.00 0.00 0.00 215.00 0.00 106.10 1010.68 94.45 253.40 4.65 99.90 sdf 0.00 0.00 0.00 73.00 0.00 32.04 898.74 2.20 30.08 5.11 37.30 sdb 0.00 0.00 0.00 92.00 0.00 40.05 891.48 2.55 27.70 4.85 44.60 sdj 0.00 44.00 0.00 280.00 0.00 91.61 670.03 109.32 314.59 3.57 99.90 sdk 0.00 1.00 0.00 210.00 0.00 100.63 981.41 97.79 419.98 4.76 99.90 sdm 0.00 42.00 0.00 282.00 0.00 100.27 728.23 92.86 285.38 3.54 99.90 sdn 0.00 0.00 0.00 213.00 0.00 100.81 969.31 41.62 301.33 4.67 99.40 sdo 0.00 39.00 0.00 306.00 0.00 102.84 688.29 82.44 279.69 3.26 99.70 sdl 0.00 0.00 0.00 219.00 0.00 104.16 974.06 83.05 421.80 4.56 99.90 sdp 0.00 46.00 0.00 277.00 0.00 97.01 717.23 106.44 324.31 3.61 99.90 sdi 0.00 0.00 0.00 56.00 0.00 24.03 878.86 1.73 30.91 5.05 28.30 sdr 0.00 34.00 0.00 266.00 0.00 97.66 751.91 63.86 304.39 3.76 100.00 sds 0.00 18.00 0.00 67.00 0.00 17.41 532.18 1.68 25.03 3.79 25.40 sdt 0.00 0.00 0.00 130.00 0.00 64.01 1008.37 56.33 166.52 4.99 64.90 sdu 0.00 0.00 0.00 197.00 0.00 95.02 987.82 44.70 282.45 4.95 97.60 sdw 0.00 0.00 0.00 207.00 0.00 93.39 923.98 90.21 448.08 4.83 99.90 sdv 0.00 0.00 0.00 204.00 0.00 100.52 1009.14 84.16 425.70 4.85 98.90 sdx 0.00 0.00 0.00 203.00 0.00 88.75 895.33 87.10 475.92 4.92 99.90 sdq 0.00 0.00 0.00 18.00 0.00 8.01 911.11 0.52 28.83 4.83 8.70 02/02/2012 09:22:35 AM avg-cpu: %user %nice %system %iowait %steal %idle 14.63 0.00 50.99 22.22 0.00 12.16 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sdc 0.00 0.00 0.00 209.00 0.00 99.54 975.35 84.02 409.76 4.78 99.90 sdd 0.00 0.00 0.00 13.00 0.00 5.50 867.08 0.34 57.31 6.23 8.10 sde 0.00 0.00 0.00 204.00 0.00 98.12 985.06 87.28 418.62 4.88 99.50 sdh 0.00 0.00 0.00 78.00 0.00 34.12 895.79 2.15 30.26 5.37 41.90 sdg 0.00 0.00 0.00 226.00 0.00 108.48 983.04 93.54 336.46 4.42 99.80 sda 0.00 0.00 0.00 219.00 0.00 108.07 1010.63 80.90 510.96 4.53 99.20 sdf 0.00 6.00 0.00 81.00 0.00 21.20 535.90 1.99 24.47 3.59 29.10 sdb 0.00 0.00 0.00 71.00 0.00 32.03 923.94 2.46 34.63 4.65 33.00 sdj 0.00 0.00 0.00 192.00 0.00 83.87 894.62 83.33 459.53 5.21 100.10 sdk 0.00 41.00 0.00 285.00 0.00 94.12 676.32 104.34 310.17 3.51 100.10 sdm 0.00 0.00 0.00 202.00 0.00 90.44 916.91 86.45 506.52 4.96 100.10 sdn 0.00 0.00 0.00 208.00 0.00 101.48 999.23 87.79 323.35 4.79 99.70 sdo 0.00 1.00 0.00 228.00 0.00 108.63 975.75 89.79 327.24 4.38 99.80 sdl 0.00 28.00 0.00 270.00 0.00 97.64 740.65 52.06 281.67 3.54 95.60 sdp 0.00 0.00 0.00 195.00 0.00 85.65 899.57 92.28 453.54 5.14 100.20 sdi 0.00 14.00 0.00 31.00 0.00 9.02 595.61 0.96 30.94 4.77 14.80 sdr 0.00 0.00 0.00 192.00 0.00 83.11 886.46 14.22 142.39 5.06 97.10 sds 0.00 0.00 0.00 18.00 0.00 8.01 911.11 0.73 40.39 5.89 10.60 sdt 0.00 0.00 0.00 201.00 0.00 98.66 1005.29 65.87 425.37 4.89 98.30 sdu 0.00 0.00 0.00 209.00 0.00 103.01 1009.38 87.49 285.51 4.74 99.10 sdw 0.00 0.00 0.00 204.00 0.00 96.74 971.22 82.66 410.50 4.89 99.70 sdv 0.00 0.00 0.00 198.00 0.00 96.61 999.23 83.39 420.17 5.03 99.50 sdx 0.00 0.00 0.00 204.00 0.00 98.79 991.80 86.54 428.67 4.90 100.00 sdq 0.00 0.00 0.00 36.00 0.00 16.02 911.11 0.88 24.33 4.44 16.00 The above suggests to me that the slowdown is a result of requests not getting submitted at the same rate as when things are running well. -- Jim
Yehuda
-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html