Re: Huge latency spikes

Alex Litvak <alexander.v.litvak@xxxxxxxxx> · Sat, 17 Nov 2018 16:52:02 -0600

Plot thickens:

I checked c-states and apparently I am operating in c1 with all CPUS on.  Apparently servers were tuned to use latency-performance

 tuned-adm active
Current active profile: latency-performance

turbostat shows
 Package    Core     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz     SMI  CPU%c1  CPU%c3  CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt RAMWatt   PKG_%   RAM_%
       -       -       -      22    0.84    2600    2400       0   99.16    0.00    0.00    0.00      49      58    0.00    0.00    0.00    0.00   69.51   17.29    0.00    0.00
       0       0       0      39    1.52    2600    2400       0   98.48    0.00    0.00    0.00      48      58    0.00    0.00    0.00    0.00   36.30    8.73    0.00    0.00
       0       0      12      15    0.56    2600    2400       0   99.44
       0       1       2      47    1.81    2600    2400       0   98.19    0.00    0.00    0.00      49
       0       1      14      17    0.66    2600    2400       0   99.34
       0       2       4      31    1.20    2600    2400       0   98.80    0.00    0.00    0.00      47
       0       2      16      18    0.71    2600    2400       0   99.29
       0       3       6      31    1.21    2600    2400       0   98.79    0.00    0.00    0.00      49
       0       3      18      39    1.50    2600    2400       0   98.50
       0       4       8      33    1.27    2600    2400       0   98.73    0.00    0.00    0.00      46
       0       4      20      17    0.64    2600    2400       0   99.36
       0       5      10      32    1.23    2600    2400       0   98.77    0.00    0.00    0.00      48
       0       5      22      20    0.76    2600    2400       0   99.24
       1       0       1      25    0.95    2600    2400       0   99.05    0.00    0.00    0.00      44      52    0.00    0.00    0.00    0.00   33.21    8.56    0.00    0.00
       1       0      13       9    0.34    2600    2400       0   99.66
       1       1       3       9    0.35    2600    2400       0   99.65    0.00    0.00    0.00      42
       1       1      15      11    0.42    2600    2400       0   99.58
       1       2       5      30    1.17    2600    2400       0   98.83    0.00    0.00    0.00      46
       1       2      17       7    0.28    2600    2400       0   99.72
       1       3       7      10    0.40    2600    2400       0   99.60    0.00    0.00    0.00      44
       1       3      19      10    0.37    2600    2400       0   99.63
       1       4       9       9    0.36    2600    2400       0   99.64    0.00    0.00    0.00      45
       1       4      21       7    0.27    2600    2400       0   99.73
       1       5      11      12    0.45    2600    2400       0   99.55    0.00    0.00    0.00      45
       1       5      23      46    1.76    2600    2400       0   98.24

iostat for ssd shows

# iostat -xd -p sdb 1 1000

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.05   26.78     0.20  2299.53   171.42     0.02    0.64    0.11    0.64   0.08   0.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00   16.00     0.00   392.00    49.00     0.00    0.06    0.00    0.06   0.06   0.10

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00   74.00     0.00   880.00    23.78     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00   56.00     0.00   240.00     8.57     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00   44.00     0.00   676.00    30.73     0.00    0.07    0.00    0.07   0.05   0.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00   10.00     0.00    92.00    18.40     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00    6.00     0.00    84.00    28.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00    1.00     0.00    20.00    40.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00   25.00     0.00   212.00    16.96     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00   14.00     0.00   100.00    14.29     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00    5.00     0.00   112.00    44.80     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00   13.00     0.00   508.00    78.15     0.00    0.15    0.00    0.15   0.15   0.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00   49.00     0.00   820.00    33.47     0.01    0.10    0.00    0.10   0.08   0.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00    7.00     0.00    52.00    14.86     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00   18.00     0.00   180.00    20.00     0.00    0.06    0.00    0.06   0.06   0.10

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00   34.00     0.00   476.00    28.00     0.00    0.06    0.00    0.06   0.06   0.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    1.00   12.00     4.00   156.00    24.62     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00   32.00     0.00   940.00    58.75     0.00    0.03    0.00    0.03   0.03   0.10

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00   13.00     0.00   456.00    70.15     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00   37.00     0.00   536.00    28.97     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00    6.00     0.00    60.00    20.00     0.00    0.17    0.00    0.17   0.17   0.10

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00    3.00     0.00    48.00    32.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00   10.00     0.00  1452.00   290.40     0.00    0.30    0.00    0.30   0.20   0.20

On 11/17/2018 3:42 PM, John Petrini wrote:
You can check if cstates are enabled with cat /proc/acpi/processor/info. Look for power management: yes/no.||

If they are enabled then you can check the current cstate of each core. 0 is the CPU's normal operating range, any other state means the processor is in a power saving mode. cat 
/proc/acpi/processor/CPU?/power.

cstates are configured in the bios so a reboot is required to change them. I know with Dell servers you can trigger the change with omconfig and then issue a reboot for it to take effect. Otherwise 
you'll need to disable it directly in the bios.

As for the SSD's I would just run iostat and check the iowait. If you see small disk writes causing high iowait then your SSD's are probably at the end of their life. Ceph journaling is good at 
destroying SSD's.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com