On Mon, Feb 6, 2012 at 8:20 AM, Jim Schutt <jaschut@xxxxxxxxxx> wrote: > On 02/03/2012 05:03 PM, Yehuda Sadeh Weinraub wrote: >> >> On Fri, Feb 3, 2012 at 3:33 PM, Jim Schutt<jaschut@xxxxxxxxxx> wrote: > > >> >> You can try running 'iostat -t -kx -d 1' on the osds, and see whether >> %util >> reaches 100%, and when it happens whether it's due to number of io >> operations that are thrashing, or whether it's due to high amount of data. >> FWIW, you may try setting 'filestore flusher = false', and set >> /proc/sys/vm/dirty_background_ratio' to a small number (e.g., 1M). > > > Here's some iostat data from early in a run, when things are > running well: > > > 02/02/2012 09:14:13 AM > avg-cpu: %user %nice %system %iowait %steal %idle > 23.24 0.00 61.99 7.38 0.00 7.38 > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await svctm %util > sdc 0.00 0.00 0.00 206.00 0.00 101.57 1009.79 > 54.80 251.27 4.86 100.10 > sdd 0.00 0.00 0.00 202.00 0.00 98.10 994.61 > 27.85 132.42 4.96 100.10 > sde 0.00 4.00 0.00 212.00 0.00 105.09 1015.25 > 96.06 588.43 4.72 100.10 > sdh 0.00 0.00 0.00 200.00 0.00 97.11 994.40 > 69.77 535.01 5.00 100.10 > sdg 0.00 2.00 0.00 221.00 0.00 109.59 1015.60 > 82.05 298.71 4.53 100.10 > sda 0.00 1.00 0.00 212.00 0.00 83.93 810.75 > 18.26 84.82 4.68 99.30 > sdf 0.00 0.00 0.00 208.00 0.00 102.55 1009.73 > 77.23 383.19 4.50 93.70 > sdb 0.00 0.00 0.00 205.00 0.00 98.66 985.68 > 19.97 133.98 4.84 99.20 > sdj 0.00 0.00 0.00 202.00 0.00 99.59 1009.66 > 69.97 257.47 4.95 100.00 > sdk 0.00 0.00 0.00 204.00 0.00 98.10 984.86 > 20.83 100.34 4.87 99.30 > sdm 0.00 0.00 0.00 216.00 0.00 106.55 1010.22 > 77.73 268.67 4.63 100.00 > sdn 0.00 0.00 0.00 205.00 0.00 98.60 985.05 > 19.33 95.88 4.81 98.60 > sdo 0.00 0.00 0.00 232.00 0.00 106.25 937.93 > 23.26 82.19 4.29 99.50 > sdl 0.00 0.00 0.00 181.00 0.00 85.12 963.09 > 24.73 131.71 4.80 86.80 > sdp 0.00 4.00 0.00 207.00 0.00 87.41 864.77 > 37.01 111.13 4.49 93.00 > sdi 0.00 0.00 0.00 208.00 0.00 103.04 1014.54 > 72.30 263.72 4.70 97.70 > sdr 0.00 0.00 0.00 191.00 0.00 76.75 822.95 > 11.51 83.69 4.59 87.60 > sds 0.00 0.00 0.00 209.00 0.00 101.91 998.58 > 49.95 278.08 4.70 98.20 > sdt 0.00 0.00 0.00 209.00 0.00 99.57 975.69 > 27.31 157.44 4.79 100.10 > sdu 0.00 0.00 0.00 216.00 0.00 107.09 1015.41 > 79.82 345.88 4.63 100.10 > sdw 0.00 0.00 0.00 208.00 0.00 103.09 1015.00 > 74.55 308.15 4.81 100.10 > sdv 0.00 0.00 0.00 201.00 0.00 98.05 999.08 > 76.87 265.88 4.98 100.10 > sdx 0.00 0.00 0.00 202.00 0.00 100.50 1018.93 > 110.40 327.68 4.96 100.10 > sdq 0.00 0.00 0.00 228.00 0.00 112.59 1011.30 > 54.84 281.04 4.39 100.10 > > 02/02/2012 09:14:14 AM > avg-cpu: %user %nice %system %iowait %steal %idle > 22.11 0.00 54.03 15.38 0.00 8.48 > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await svctm %util > sdc 0.00 0.00 0.00 233.00 0.00 99.68 876.15 > 95.98 384.42 4.29 100.00 > sdd 0.00 0.00 0.00 205.00 0.00 96.64 965.46 > 20.37 108.51 4.84 99.30 > sde 0.00 0.00 0.00 225.00 0.00 99.54 906.03 > 92.38 420.67 4.44 100.00 > sdh 0.00 0.00 0.00 198.00 0.00 97.05 1003.84 > 79.39 410.56 5.05 100.00 > sdg 0.00 0.00 0.00 245.00 0.00 108.38 905.99 > 84.40 385.47 4.08 100.00 > sda 0.00 4.00 0.00 220.00 0.00 96.23 895.78 > 63.24 294.59 4.44 97.60 > sdf 0.00 0.00 0.00 216.00 0.00 107.09 1015.41 > 87.67 399.14 4.57 98.80 > sdb 0.00 0.00 0.00 156.00 0.00 72.05 945.95 > 11.61 58.94 4.84 75.50 > sdj 0.00 0.00 0.00 199.00 0.00 95.41 981.95 > 56.28 366.11 4.84 96.40 > sdk 0.00 0.00 0.00 206.00 0.00 100.14 995.57 > 54.69 241.41 4.86 100.10 > sdm 0.00 0.00 0.00 200.00 0.00 99.09 1014.72 > 79.51 506.47 4.74 94.70 > sdn 0.00 0.00 0.00 191.00 0.00 91.29 978.81 > 26.82 128.39 5.18 98.90 > sdo 0.00 0.00 0.00 234.00 0.00 106.75 934.32 > 49.82 231.07 4.27 100.00 > sdl 0.00 0.00 0.00 214.00 0.00 103.62 991.70 > 33.03 168.13 4.62 98.80 > sdp 0.00 0.00 0.00 219.00 0.00 106.08 992.00 > 64.69 328.92 4.57 100.00 > sdi 0.00 0.00 0.00 210.00 0.00 104.09 1015.09 > 100.98 421.01 4.76 100.00 > sdr 0.00 0.00 0.00 180.00 0.00 81.66 929.07 > 10.31 63.59 5.12 92.20 > sds 0.00 0.00 0.00 201.00 0.00 95.15 969.47 > 32.60 144.16 4.98 100.00 > sdt 0.00 0.00 0.00 198.00 0.00 95.72 990.10 > 33.26 155.98 4.84 95.90 > sdu 0.00 0.00 0.00 219.00 0.00 108.59 1015.53 > 66.10 347.91 4.57 100.00 > sdw 0.00 0.00 0.00 204.00 0.00 100.75 1011.41 > 81.20 456.47 4.80 98.00 > sdv 0.00 0.00 0.00 197.00 0.00 96.09 998.90 > 44.19 284.65 5.08 100.00 > sdx 0.00 0.00 0.00 211.00 0.00 104.19 1011.26 > 84.87 542.85 4.69 99.00 > sdq 0.00 0.00 0.00 216.00 0.00 105.10 996.52 > 36.63 134.40 4.63 100.00 > > > This is later in the same run, when things are not going as well: > > 02/02/2012 09:21:52 AM > avg-cpu: %user %nice %system %iowait %steal %idle > 5.13 0.00 13.31 8.52 0.00 73.04 > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await svctm %util > sdc 0.00 0.00 0.00 36.00 0.00 16.02 911.11 > 1.43 39.72 5.64 20.30 > sdd 0.00 0.00 0.00 18.00 0.00 8.01 911.11 > 0.85 47.28 6.39 11.50 > sde 0.00 0.00 0.00 4.00 0.00 0.01 6.00 > 0.08 20.00 13.00 5.20 > sdh 0.00 0.00 0.00 20.00 0.00 8.01 820.40 > 0.65 32.40 5.30 10.60 > sdg 0.00 0.00 0.00 19.00 0.00 8.01 863.58 > 0.60 31.63 4.63 8.80 > sda 0.00 0.00 0.00 82.00 0.00 36.04 900.10 > 3.13 37.05 5.15 42.20 > sdf 0.00 0.00 0.00 18.00 0.00 8.01 911.11 > 0.80 44.22 6.39 11.50 > sdb 0.00 8.00 0.00 42.00 0.00 1.75 85.52 > 0.14 3.43 1.40 5.90 > sdj 0.00 16.00 0.00 103.00 0.00 25.64 509.83 > 2.21 21.36 3.65 37.60 > sdk 0.00 14.00 0.00 152.00 0.00 47.93 645.79 > 3.96 27.31 4.12 62.60 > sdm 0.00 0.00 0.00 21.00 0.00 9.39 915.81 > 0.94 44.57 5.71 12.00 > sdn 0.00 34.00 0.00 197.00 0.00 64.61 671.72 > 28.66 85.62 4.02 79.10 > sdo 0.00 0.00 0.00 92.00 0.00 42.54 946.87 > 6.22 55.58 4.85 44.60 > sdl 0.00 0.00 0.00 6.00 0.00 2.01 685.33 > 0.09 59.67 6.33 3.80 > sdp 0.00 10.00 0.00 58.00 0.00 9.56 337.52 > 1.20 20.60 3.05 17.70 > sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > sdr 0.00 0.00 0.00 37.00 0.00 16.02 886.92 > 1.19 32.27 5.11 18.90 > sds 0.00 18.00 0.00 115.00 0.00 26.54 472.70 > 4.03 25.94 3.20 36.80 > sdt 0.00 0.00 0.00 131.00 0.00 60.05 938.87 > 6.13 46.33 5.11 67.00 > sdu 0.00 12.00 0.00 119.00 0.00 31.40 540.44 > 2.93 24.65 3.05 36.30 > sdw 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > sdv 0.00 4.00 0.00 63.00 0.00 9.46 307.68 > 0.83 14.32 2.38 15.00 > sdx 0.00 0.00 0.00 35.00 0.00 15.51 907.66 > 0.79 28.20 4.89 17.10 > sdq 0.00 0.00 0.00 37.00 0.00 16.02 886.70 > 1.52 41.00 5.86 21.70 > > 02/02/2012 09:21:53 AM > avg-cpu: %user %nice %system %iowait %steal %idle > 3.74 0.00 8.75 6.60 0.00 80.90 > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await svctm %util > sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > sdg 0.00 0.00 0.00 18.00 0.00 8.01 911.11 > 0.88 48.94 6.83 12.30 > sda 0.00 0.00 0.00 45.00 0.00 7.38 335.64 > 0.54 18.87 1.78 8.00 > sdf 0.00 0.00 0.00 18.00 0.00 8.01 911.11 > 0.93 51.44 6.78 12.20 > sdb 0.00 0.00 0.00 5.00 0.00 0.74 302.40 > 0.05 10.20 8.20 4.10 > sdj 0.00 0.00 0.00 72.00 0.00 32.03 911.11 > 2.51 34.99 5.01 36.10 > sdk 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > sdm 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > sdn 0.00 0.00 0.00 123.00 0.00 52.60 875.84 > 13.83 209.72 4.84 59.50 > sdo 0.00 0.00 0.00 13.00 0.00 5.52 868.92 > 0.30 108.31 4.69 6.10 > sdl 0.00 0.00 0.00 27.00 0.00 12.47 945.78 > 1.33 47.15 6.59 17.80 > sdp 0.00 0.00 0.00 11.00 0.00 4.50 838.55 > 0.51 14.09 5.09 5.60 > sdi 0.00 0.00 0.00 19.00 0.00 8.01 863.58 > 0.72 38.05 5.74 10.90 > sdr 0.00 0.00 0.00 18.00 0.00 8.01 911.11 > 0.69 38.33 5.89 10.60 > sds 0.00 0.00 0.00 56.00 0.00 19.66 718.86 > 1.31 39.16 5.11 28.60 > sdt 0.00 0.00 0.00 161.00 0.00 72.57 923.18 > 6.97 37.39 5.07 81.70 > sdu 0.00 0.00 0.00 66.00 0.00 30.02 931.64 > 2.77 39.85 5.09 33.60 > sdw 0.00 0.00 0.00 20.00 0.00 8.51 871.60 > 1.47 27.80 4.85 9.70 > sdv 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > sdx 0.00 0.00 0.00 36.00 0.00 16.02 911.11 > 1.37 38.08 5.72 20.60 > sdq 0.00 0.00 0.00 44.00 0.00 19.46 906.00 > 1.15 26.02 4.50 19.80 > > And finally, this is still later, near the end of the run, when things have > recovered > somewhat: > > 02/02/2012 09:22:34 AM > avg-cpu: %user %nice %system %iowait %steal %idle > 15.25 0.00 52.27 20.88 0.00 11.60 > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await svctm %util > sdc 0.00 1.00 0.00 217.00 0.00 95.20 898.51 > 84.43 413.56 4.60 99.90 > sdd 0.00 0.00 0.00 40.00 0.00 16.86 863.00 > 1.59 28.45 5.55 22.20 > sde 0.00 0.00 0.00 206.00 0.00 99.27 986.95 > 89.64 452.92 4.85 99.90 > sdh 0.00 0.00 0.00 51.00 0.00 22.53 904.63 > 2.02 35.45 5.47 27.90 > sdg 0.00 0.00 0.00 230.00 0.00 112.49 1001.63 > 92.87 283.01 4.33 99.60 > sda 0.00 0.00 0.00 215.00 0.00 106.10 1010.68 > 94.45 253.40 4.65 99.90 > sdf 0.00 0.00 0.00 73.00 0.00 32.04 898.74 > 2.20 30.08 5.11 37.30 > sdb 0.00 0.00 0.00 92.00 0.00 40.05 891.48 > 2.55 27.70 4.85 44.60 > sdj 0.00 44.00 0.00 280.00 0.00 91.61 670.03 > 109.32 314.59 3.57 99.90 > sdk 0.00 1.00 0.00 210.00 0.00 100.63 981.41 > 97.79 419.98 4.76 99.90 > sdm 0.00 42.00 0.00 282.00 0.00 100.27 728.23 > 92.86 285.38 3.54 99.90 > sdn 0.00 0.00 0.00 213.00 0.00 100.81 969.31 > 41.62 301.33 4.67 99.40 > sdo 0.00 39.00 0.00 306.00 0.00 102.84 688.29 > 82.44 279.69 3.26 99.70 > sdl 0.00 0.00 0.00 219.00 0.00 104.16 974.06 > 83.05 421.80 4.56 99.90 > sdp 0.00 46.00 0.00 277.00 0.00 97.01 717.23 > 106.44 324.31 3.61 99.90 > sdi 0.00 0.00 0.00 56.00 0.00 24.03 878.86 > 1.73 30.91 5.05 28.30 > sdr 0.00 34.00 0.00 266.00 0.00 97.66 751.91 > 63.86 304.39 3.76 100.00 > sds 0.00 18.00 0.00 67.00 0.00 17.41 532.18 > 1.68 25.03 3.79 25.40 > sdt 0.00 0.00 0.00 130.00 0.00 64.01 1008.37 > 56.33 166.52 4.99 64.90 > sdu 0.00 0.00 0.00 197.00 0.00 95.02 987.82 > 44.70 282.45 4.95 97.60 > sdw 0.00 0.00 0.00 207.00 0.00 93.39 923.98 > 90.21 448.08 4.83 99.90 > sdv 0.00 0.00 0.00 204.00 0.00 100.52 1009.14 > 84.16 425.70 4.85 98.90 > sdx 0.00 0.00 0.00 203.00 0.00 88.75 895.33 > 87.10 475.92 4.92 99.90 > sdq 0.00 0.00 0.00 18.00 0.00 8.01 911.11 > 0.52 28.83 4.83 8.70 > > 02/02/2012 09:22:35 AM > avg-cpu: %user %nice %system %iowait %steal %idle > 14.63 0.00 50.99 22.22 0.00 12.16 > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await svctm %util > sdc 0.00 0.00 0.00 209.00 0.00 99.54 975.35 > 84.02 409.76 4.78 99.90 > sdd 0.00 0.00 0.00 13.00 0.00 5.50 867.08 > 0.34 57.31 6.23 8.10 > sde 0.00 0.00 0.00 204.00 0.00 98.12 985.06 > 87.28 418.62 4.88 99.50 > sdh 0.00 0.00 0.00 78.00 0.00 34.12 895.79 > 2.15 30.26 5.37 41.90 > sdg 0.00 0.00 0.00 226.00 0.00 108.48 983.04 > 93.54 336.46 4.42 99.80 > sda 0.00 0.00 0.00 219.00 0.00 108.07 1010.63 > 80.90 510.96 4.53 99.20 > sdf 0.00 6.00 0.00 81.00 0.00 21.20 535.90 > 1.99 24.47 3.59 29.10 > sdb 0.00 0.00 0.00 71.00 0.00 32.03 923.94 > 2.46 34.63 4.65 33.00 > sdj 0.00 0.00 0.00 192.00 0.00 83.87 894.62 > 83.33 459.53 5.21 100.10 > sdk 0.00 41.00 0.00 285.00 0.00 94.12 676.32 > 104.34 310.17 3.51 100.10 > sdm 0.00 0.00 0.00 202.00 0.00 90.44 916.91 > 86.45 506.52 4.96 100.10 > sdn 0.00 0.00 0.00 208.00 0.00 101.48 999.23 > 87.79 323.35 4.79 99.70 > sdo 0.00 1.00 0.00 228.00 0.00 108.63 975.75 > 89.79 327.24 4.38 99.80 > sdl 0.00 28.00 0.00 270.00 0.00 97.64 740.65 > 52.06 281.67 3.54 95.60 > sdp 0.00 0.00 0.00 195.00 0.00 85.65 899.57 > 92.28 453.54 5.14 100.20 > sdi 0.00 14.00 0.00 31.00 0.00 9.02 595.61 > 0.96 30.94 4.77 14.80 > sdr 0.00 0.00 0.00 192.00 0.00 83.11 886.46 > 14.22 142.39 5.06 97.10 > sds 0.00 0.00 0.00 18.00 0.00 8.01 911.11 > 0.73 40.39 5.89 10.60 > sdt 0.00 0.00 0.00 201.00 0.00 98.66 1005.29 > 65.87 425.37 4.89 98.30 > sdu 0.00 0.00 0.00 209.00 0.00 103.01 1009.38 > 87.49 285.51 4.74 99.10 > sdw 0.00 0.00 0.00 204.00 0.00 96.74 971.22 > 82.66 410.50 4.89 99.70 > sdv 0.00 0.00 0.00 198.00 0.00 96.61 999.23 > 83.39 420.17 5.03 99.50 > sdx 0.00 0.00 0.00 204.00 0.00 98.79 991.80 > 86.54 428.67 4.90 100.00 > sdq 0.00 0.00 0.00 36.00 0.00 16.02 911.11 > 0.88 24.33 4.44 16.00 > > > The above suggests to me that the slowdown is a result > of requests not getting submitted at the same rate as > when things are running well. > Yeah, it really looks like that. My suggestions wouldn't help there. I do see that when things go well the number of writes per device is capped at ~200 writes per second and the throughput per device is ~100MB/sec. Is 100MB/sec the expected device throughput? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html