Re: an osd which reweight is 0.0 in crushmap has high latency in osd perf

lin zhou <hnuzhoulin2@xxxxxxxxx> · Wed, 30 Mar 2016 12:20:54 +0800

some info update:

disk are 3TB SATA,Model Number:       WDC WD3000FYYZ-01UL1B1
and today,I try to set osd.0 reweight to 0.1;and then check.some
useful data found.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.63    0.00    0.48   16.15    0.00   81.75

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    2.00     0.00     1.00
1024.00    39.85 1134.00    0.00 1134.00 500.00 100.00
sda1              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
sda2              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
sda3              0.00     0.00    0.00    0.00     0.00     0.00
0.00     1.00    0.00    0.00    0.00   0.00 100.40
sda4              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
sda5              0.00     0.00    0.00    2.00     0.00     1.00
1024.00    32.32 1134.00    0.00 1134.00 502.00 100.40
sda6              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.66    0.00    0.00    0.00   0.00  66.40
sda7              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
sda8              0.00     0.00    0.00    0.00     0.00     0.00
0.00     1.00    0.00    0.00    0.00   0.00 100.00
sda9              0.00     0.00    0.00    0.00     0.00     0.00
0.00     1.00    0.00    0.00    0.00   0.00 100.00
sda10             0.00     0.00    0.00    0.00     0.00     0.00
0.00     1.00    0.00    0.00    0.00   0.00 100.00

^C^C^C^C^C^C^C^C^C
root@node-65:~# ls -l /var/lib/ceph/osd/ceph-0
total 62924048
-rw-r--r--   1 root root         487 Oct 12 16:49 activate.monmap
-rw-r--r--   1 root root           3 Oct 12 16:49 active
-rw-r--r--   1 root root          37 Oct 12 16:49 ceph_fsid
drwxr-xr-x 280 root root        8192 Mar 30 11:58 current
-rw-r--r--   1 root root          37 Oct 12 16:49 fsid
lrwxrwxrwx   1 root root           9 Oct 12 16:49 journal -> /dev/sda5
-rw-------   1 root root          56 Oct 12 16:49 keyring
-rw-r--r--   1 root root          21 Oct 12 16:49 magic
-rw-r--r--   1 root root           6 Oct 12 16:49 ready
-rw-r--r--   1 root root           4 Oct 12 16:49 store_version
-rw-r--r--   1 root root          42 Oct 12 16:49 superblock
-rw-r--r--   1 root root 64424509440 Mar 30 10:20 testfio.file
-rw-r--r--   1 root root           0 Mar 28 09:54 upstart
-rw-r--r--   1 root root           2 Oct 12 16:49 whoami

the journal of osd.0 is sda5,it is so busy.and cpu wait in top is
30%.the system is slow.

So,maybe this is the problem of sda5?it is INTEL SSDSC2BB120G4
we use two SSD for journal and system.

root@node-65:~# lsblk
NAME               MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                  8:0    0 111.8G  0 disk
├─sda1               8:1    0    22M  0 part
├─sda2               8:2    0   191M  0 part /boot
├─sda3               8:3    0  43.9G  0 part /
├─sda4               8:4    0   3.8G  0 part [SWAP]
├─sda5               8:5    0  10.2G  0 part
├─sda6               8:6    0  10.2G  0 part
├─sda7               8:7    0  10.2G  0 part
├─sda8               8:8    0  10.2G  0 part
├─sda9               8:9    0  10.2G  0 part
└─sda10              8:10   0  10.2G  0 part
sdb                  8:16   0 111.8G  0 disk
├─sdb1               8:17   0    24M  0 part
├─sdb2               8:18   0  10.2G  0 part
├─sdb3               8:19   0  10.2G  0 part
├─sdb4               8:20   0  10.2G  0 part
├─sdb5               8:21   0  10.2G  0 part
├─sdb6               8:22   0  10.2G  0 part
├─sdb7               8:23   0  10.2G  0 part
└─sdb8               8:24   0  50.1G  0 part

2016-03-30 11:17 GMT+08:00 lin zhou <hnuzhoulin2@xxxxxxxxx>:
> Hi,cephers
>
> some osd has high latency in theoutput of ceph osd perf,but I have
> setting the reweight of these osd in crushmap tp 0.0
> and I use iostat to check these disk,no load.
>
> so how does command `ceph osd perf` work?
>
> root@node-67:~# ceph osd perf
> osdid fs_commit_latency(ms) fs_apply_latency(ms)
>     0                  9819                10204
>     1                     1                    2
>     2                     1                    2
>     3                  9126                 9496
>     4                     2                    3
>     5                     2                    3
>     6                     0                    0
>     7                     0                    1
>     8                     1                    2
>     9                 32960                33953
>    10                     2                    3
>    11                     1                   31
>    12                 23450                24324
>    13                     0                    1
>    14                     1                    2
>    15                 18882                19622
>    16                     2                    4
>    17                     1                    2
>    18                     3                   46
>
>
> root@node-67:~# ceph osd tree
> # id    weight  type name       up/down reweight
> -1      103.7   root default
> -2      8.19            host node-65
> 18      2.73                    osd.18  up      1
> 21      0                       osd.21  up      1
> 24      2.73                    osd.24  up      1
> 27      2.73                    osd.27  up      1
> 30      0                       osd.30  up      1
> 33      0                       osd.33  up      1
> 0       0                       osd.0   up      1
> 3       0                       osd.3   up      1
> 6       0                       osd.6   down    0
> 9       0                       osd.9   up      1
> 12      0                       osd.12  up      1
> 15      0                       osd.15  up      1
>
> root@node-65:~# iostat  -xm 1 -p sdd,sde,sdf,sdg
> Linux 3.11.0-26-generic (node-65)       03/30/2016      _x86_64_        (40 CPU)
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            3.34    0.00    0.61    2.60    0.00   93.45
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sdd               0.00     0.05    0.08    0.16     0.00     0.01
> 64.48     0.01   29.11    4.21   42.09   1.70   0.04
> sdd1              0.00     0.05    0.08    0.16     0.00     0.01
> 64.54     0.01   29.14    4.22   42.09   1.71   0.04
> sde               0.00     0.00    0.04    0.00     0.00     0.00
> 15.59     0.00    4.00    4.00    0.85   3.86   0.01
> sde1              0.00     0.00    0.04    0.00     0.00     0.00
> 15.62     0.00    4.01    4.02    0.85   3.88   0.01
> sdf               0.00     0.04    0.39    0.31     0.04     0.01
> 148.58     0.01   13.58    2.61   27.50   1.17   0.08
> sdf1              0.00     0.04    0.39    0.31     0.04     0.01
> 148.63     0.01   13.59    2.61   27.50   1.17   0.08
> sdg               0.00     0.06    0.22    0.24     0.02     0.02
> 162.41     0.01   23.64    2.85   42.89   1.49   0.07
> sdg1              0.00     0.06    0.22    0.24     0.02     0.02
> 162.49     0.01   23.65    2.86   42.89   1.49   0.07
>
>
> root@node-65:~# top
>
> top - 11:16:34 up 11 days, 19:26,  4 users,  load average: 4.46, 4.73, 3.66
> Tasks: 588 total,   1 running, 587 sleeping,   0 stopped,   0 zombie
> Cpu(s):  3.6%us,  0.6%sy,  0.0%ni, 93.5%id,  2.1%wa,  0.0%hi,  0.1%si,  0.0%st
> Mem:  131998604k total, 69814800k used, 62183804k free,   216612k buffers
> Swap:  3999740k total,        0k used,  3999740k free, 14093896k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 17000 libvirt-  20   0 39.7g  32g  12m S  109 25.7  18417:09 qemu-system-x86
>  8852 root      20   0 2116m 854m 9896 S   20  0.7 303:24.39 ceph-osd
>  9319 root      20   0 2285m 974m  12m S   16  0.8 385:24.69 ceph-osd
>  8797 root      20   0 2298m 929m  10m S   10  0.7 298:14.61 ceph-osd
> 17001 root      20   0     0    0    0 S    4  0.0 384:38.16 vhost-17000
> 24241 nova      20   0  148m  64m 4908 S    2  0.1  89:47.86 nova-network
> 24189 nova      20   0  125m  37m 4880 S    1  0.0  49:48.06 nova-api
>     1 root      20   0 24720 2644 1376 S    0  0.0   4:35.38 init
>    50 root      20   0     0    0    0 S    0  0.0  46:32.66 rcu_sched
>    52 root      20   0     0    0    0 S    0  0.0   1:21.29 rcuos/1
>    64 root      20   0     0    0    0 S    0  0.0   4:34.05 rcuos/13
>    67 root      20   0     0    0    0 S    0  0.0   3:06.16 rcuos/16
>    68 root      20   0     0    0    0 S    0  0.0   2:45.34 rcuos/17
>  4732 root      20   0 1650m 807m  15m S    0  0.6   3:24.89 ceph-osd
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com