Re: Ceph luminous - troubleshooting performance issues overall DSK 100%, busy 1%

Steven Vacaroaia <stef97@xxxxxxxxx> · Wed, 11 Apr 2018 12:48:41 +0000

Thanks for the suggestion but , unfortunately, having same number of OSD did not solve the issue Here is with 2 OSD per server, 3 servers - identical servers and osd configuration 

[root@osd01 ~]# ceph osd tree
ID CLASS WEIGHT  TYPE NAME      STATUS REWEIGHT PRI-AFF
-1       4.02173 root default
-9       1.14917     host osd01
 5   hdd 0.57458         osd.5      up  1.00000 1.00000
 6   hdd 0.57458         osd.6      up  1.00000 1.00000
-7       1.14899     host osd02
 0   hdd 0.57500         osd.0      up  1.00000 1.00000
 1   hdd 0.57500         osd.1      up  1.00000 1.00000
-3       1.14899     host osd03
 2   hdd 0.57500         osd.2      up  1.00000 1.00000
 3   hdd 0.57500         osd.3      up  1.00000 1.00000
-4       0.57458     host osd04
 4   hdd 0.57458         osd.4      up        0 1.00000
[root@osd01 ~]# ceph osd df tree
ID CLASS WEIGHT  REWEIGHT SIZE  USE    AVAIL %USE VAR  PGS TYPE NAME
-1       4.02173        - 1176G 89108M 1089G    0    0   - root default
-9       1.14917        - 1176G 89108M 1089G 7.39 1.02   -     host osd01
 5   hdd 0.57458  1.00000  588G 44498M  544G 7.39 1.02  47         osd.5
 6   hdd 0.57458  1.00000  588G 44610M  544G 7.40 1.02  46         osd.6
-7       1.14899        - 1176G 84472M 1094G 7.01 0.96   -     host osd02
 0   hdd 0.57500  1.00000  588G 42290M  547G 7.02 0.97  35         osd.0
 1   hdd 0.57500  1.00000  588G 42182M  547G 7.00 0.96  37         osd.1
-3       1.14899        - 1176G 89320M 1089G 7.41 1.02   -     host osd03
 2   hdd 0.57500  1.00000  588G 45370M  544G 7.53 1.04  50         osd.2
 3   hdd 0.57500  1.00000  588G 43950M  545G 7.29 1.00  41         osd.3
-4       0.57458        -     0      0     0    0    0   -     host osd04
 4   hdd 0.57458        0     0      0     0    0    0   0         osd.4
                    TOTAL 4118G   287G 3830G 7.27
MIN/MAX VAR: 0.96/1.04  STDDEV: 0.20
[root@osd01 ~]# rados bench -p rbd 120 write --no-cleanup && rados bench -p rbd 120 seq
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 120 seconds or 0 objects
Object prefix: benchmark_data_osd01.tor.medavail.net_83835
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        52        36   143.993       144     0.01749   0.0387657
    2      16        52        36   71.9932         0           -   0.0387657
    3      16        62        46   61.3276        20   0.0241346    0.254428
    4      16       104        88   87.9915       168   0.0135851    0.646529
    5      16       121       105   83.9918        68   0.0152886    0.551564
    6      16       131       115   76.6591        40   0.0174347    0.517888
    7      16       131       115   65.7078         0           -    0.517888
    8      16       152       136   67.9934        42   0.0178455    0.674487
    9      16       209       193   85.7693       228   0.0202116    0.640473
   10      16       216       200    79.992        28   0.0172787    0.619349
   11      16       229       213   77.4468        52   0.0160566    0.674538
   12      16       229       213   70.9929         0           -    0.674538
   13      16       229       213    65.532         0           -    0.674538
   14      16       263       247   70.5645   45.3333    0.127854    0.734526
   15      16       272       256     68.26        36    0.044047    0.772968
   16      16       282       266   66.4934        40    0.055596    0.753213
   17      16       298       282   66.3464        64   0.0185164    0.906061
   18      16       303       287   63.7714        20   0.0163462    0.907965
   19      16       350       334   70.3088       188   0.0320304    0.907601
2018-04-11 08:46:46.521478 min lat: 0.0135851 max lat: 9.31766 avg lat: 0.807083

On Wed, 11 Apr 2018 at 01:57, Konstantin Shalygin <k0ste@xxxxxxxx> wrote:
> ceph osd df tree

> ID CLASS WEIGHT  REWEIGHT SIZE  USE    AVAIL %USE  VAR  PGS TYPE NAME

> -1       3.44714        -  588G 80693M  509G     0    0   - root default

> -9       0.57458        -  588G 80693M  509G 13.39 1.13   -     host osd01

>   5   hdd 0.57458  1.00000  588G 80693M  509G 13.39 1.13  64         osd.5

> -7       1.14899        - 1176G   130G 1046G 11.06 0.94   -     host osd02

>   0   hdd 0.57500  1.00000  588G 70061M  519G 11.63 0.98  50         osd.0

>   1   hdd 0.57500  1.00000  588G 63200M  526G 10.49 0.89  41         osd.1

> -3       1.14899        - 1176G   138G 1038G 11.76 1.00   -     host osd03

>   2   hdd 0.57500  1.00000  588G 68581M  521G 11.38 0.96  48         osd.2

>   3   hdd 0.57500  1.00000  588G 73185M  516G 12.15 1.03  53         osd.3

> -4       0.57458        -     0      0     0     0    0   -     host osd04

>   4   hdd 0.57458        0     0      0     0     0    0   0         osd.4

By adding new hosts with half of osds of present hosts you a imbalance 

your crush.

osd.4 and osd.5 do double work in compare with present hosts if your 

failure domain is host.

k

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com