Re: Some OSDs never get any data or PGs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Finally figured that it is happening because of unbalanced rack structure. When we moved the host/osd to another rack they are working just fine. Now we balanced the racks by moving hosts, some rebalancing happened due to that but everything is fine now.

Thanks,
Pardhiv Karri


On Tue, May 22, 2018 at 11:34 AM, Pardhiv Karri <meher4india@xxxxxxxxx> wrote:
Hi,

Here is our complete crush map that is being  used.

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
device 12 osd.12
device 13 osd.13
device 14 osd.14
device 15 osd.15
device 16 osd.16
device 17 osd.17
device 18 osd.18
device 19 osd.19
device 20 osd.20
device 21 osd.21
device 22 osd.22
device 23 osd.23
device 24 osd.24
device 25 osd.25
device 26 osd.26
device 27 osd.27
device 28 osd.28
device 29 osd.29
device 30 osd.30
device 31 osd.31
device 32 osd.32
device 33 osd.33
device 34 osd.34
device 35 osd.35
device 36 osd.36
device 37 osd.37
device 38 osd.38
device 39 osd.39

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host or1010051251040 {
id -3 # do not change unnecessarily
# weight 20.000
alg tree # do not change pos for existing items unnecessarily
hash 0 # rjenkins1
item osd.0 weight 2.000 pos 0
item osd.1 weight 2.000 pos 1
item osd.2 weight 2.000 pos 2
item osd.3 weight 2.000 pos 3
item osd.4 weight 2.000 pos 4
item osd.5 weight 2.000 pos 5
item osd.6 weight 2.000 pos 6
item osd.7 weight 2.000 pos 7
item osd.8 weight 2.000 pos 8
item osd.9 weight 2.000 pos 9
}
host or1010051251044 {
id -8 # do not change unnecessarily
# weight 20.000
alg tree # do not change pos for existing items unnecessarily
hash 0 # rjenkins1
item osd.30 weight 2.000 pos 0
item osd.31 weight 2.000 pos 1
item osd.32 weight 2.000 pos 2
item osd.33 weight 2.000 pos 3
item osd.34 weight 2.000 pos 4
item osd.35 weight 2.000 pos 5
item osd.36 weight 2.000 pos 6
item osd.37 weight 2.000 pos 7
item osd.38 weight 2.000 pos 8
item osd.39 weight 2.000 pos 9
}
rack rack_A1 {
id -2 # do not change unnecessarily
# weight 40.000
alg tree # do not change pos for existing items unnecessarily
hash 0 # rjenkins1
item or1010051251040 weight 20.000 pos 0
item or1010051251044 weight 20.000 pos 1
}
host or1010051251041 {
id -5 # do not change unnecessarily
# weight 20.000
alg tree # do not change pos for existing items unnecessarily
hash 0 # rjenkins1
item osd.10 weight 2.000 pos 0
item osd.11 weight 2.000 pos 1
item osd.12 weight 2.000 pos 2
item osd.13 weight 2.000 pos 3
item osd.14 weight 2.000 pos 4
item osd.15 weight 2.000 pos 5
item osd.16 weight 2.000 pos 6
item osd.17 weight 2.000 pos 7
item osd.18 weight 2.000 pos 8
item osd.19 weight 2.000 pos 9
}
host or1010051251045 {
id -9 # do not change unnecessarily
# weight 0.000
alg tree # do not change pos for existing items unnecessarily
hash 0 # rjenkins1
}
rack rack_B1 {
id -4 # do not change unnecessarily
# weight 20.000
alg tree # do not change pos for existing items unnecessarily
hash 0 # rjenkins1
item or1010051251041 weight 20.000 pos 0
item or1010051251045 weight 0.000 pos 1
}
host or1010051251042 {
id -7 # do not change unnecessarily
# weight 20.000
alg tree # do not change pos for existing items unnecessarily
hash 0 # rjenkins1
item osd.20 weight 2.000 pos 0
item osd.21 weight 2.000 pos 1
item osd.22 weight 2.000 pos 2
item osd.23 weight 2.000 pos 3
item osd.24 weight 2.000 pos 4
item osd.25 weight 2.000 pos 5
item osd.26 weight 2.000 pos 6
item osd.27 weight 2.000 pos 7
item osd.28 weight 2.000 pos 8
item osd.29 weight 2.000 pos 9
}
host or1010051251046 {
id -10 # do not change unnecessarily
# weight 0.000
alg tree # do not change pos for existing items unnecessarily
hash 0 # rjenkins1
}
host or1010051251023 {
id -11 # do not change unnecessarily
# weight 0.000
alg tree # do not change pos for existing items unnecessarily
hash 0 # rjenkins1
}
rack rack_C1 {
id -6 # do not change unnecessarily
# weight 20.000
alg tree # do not change pos for existing items unnecessarily
hash 0 # rjenkins1
item or1010051251042 weight 20.000 pos 0
item or1010051251046 weight 0.000 pos 1
item or1010051251023 weight 0.000 pos 2
}
host or1010051251048 {
id -12 # do not change unnecessarily
# weight 0.000
alg tree # do not change pos for existing items unnecessarily
hash 0 # rjenkins1
}
rack rack_D1 {
id -13 # do not change unnecessarily
# weight 0.000
alg tree # do not change pos for existing items unnecessarily
hash 0 # rjenkins1
item or1010051251048 weight 0.000 pos 0
}
root default {
id -1 # do not change unnecessarily
# weight 80.000
alg tree # do not change pos for existing items unnecessarily
hash 0 # rjenkins1
item rack_A1 weight 40.000 pos 0
item rack_B1 weight 20.000 pos 1
item rack_C1 weight 20.000 pos 2
item rack_D1 weight 0.000 pos 3
}

# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type rack
step emit
}

# end crush map

Thanks,
Pardhiv Karri

On Tue, May 22, 2018 at 9:58 AM, Pardhiv Karri <meher4india@xxxxxxxxx> wrote:
Hi David,

We are using tree algorithm.



Thanks,
Pardhiv Karri

On Tue, May 22, 2018 at 9:42 AM, David Turner <drakonstein@xxxxxxxxx> wrote:
Your PG counts per pool per osd doesn't have any PGs on osd.38. that definitely matches what your seeing, but I've never seen this happen before. The osd doesn't seem to be misconfigured at all.

Does anyone have any ideas what could be happening here?  I expected to see something wrong in one of those outputs, but it all looks good. Possibly something with straw vs straw2 or crush tunables.


On Tue, May 22, 2018, 12:33 PM Pardhiv Karri <meher4india@xxxxxxxxx> wrote:
Hi David,

root@or1010051251044:~# ceph df
GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED 
    79793G     56832G       22860G         28.65 
POOLS:
    NAME        ID     USED      %USED     MAX AVAIL     OBJECTS 
    rbd         0          0         0        14395G           0 
    compute     1          0         0        14395G           0 
    volumes     2      7605G     28.60        14395G     1947372 
    images      4          0         0        14395G           0 
root@or1010051251044:~#



pool : 4 0 1 2 | SUM 
------------------------------------------------
osd.10 8 10 44 96 | 158
osd.11 14 8 58 100 | 180
osd.12 12 6 50 95 | 163
osd.13 14 4 49 121 | 188
osd.14 9 8 54 86 | 157
osd.15 12 5 55 103 | 175
osd.16 23 5 56 99 | 183
osd.30 6 4 31 47 | 88
osd.17 8 8 50 114 | 180
osd.31 7 1 23 35 | 66
osd.18 15 5 42 94 | 156
osd.32 12 6 24 54 | 96
osd.19 13 5 54 116 | 188
osd.33 4 2 28 49 | 83
osd.34 7 5 18 62 | 92
osd.35 10 2 21 56 | 89
osd.36 5 1 34 35 | 75
osd.37 4 4 24 45 | 77
osd.39 14 8 48 106 | 176
osd.0 12 3 27 67 | 109
osd.1 8 3 27 43 | 81
osd.2 4 5 27 45 | 81
osd.3 4 3 19 50 | 76
osd.4 4 1 23 54 | 82
osd.5 4 2 23 56 | 85
osd.6 1 5 32 50 | 88
osd.7 9 1 32 66 | 108
osd.8 7 4 27 49 | 87
osd.9 6 4 24 55 | 89
osd.20 7 4 43 122 | 176
osd.21 14 5 46 95 | 160
osd.22 13 8 51 107 | 179
osd.23 11 7 54 105 | 177
osd.24 11 6 52 112 | 181
osd.25 16 6 36 98 | 156
osd.26 15 7 59 101 | 182
osd.27 7 9 58 101 | 175
osd.28 16 5 60 89 | 170
osd.29 18 7 53 94 | 172
------------------------------------------------
SUM : 384 192 1536 3072



root@or1010051251044:~# for i in `rados lspools`; do echo "================="; echo Working on pool: $i; ceph osd pool get $i pg_num; ceph osd pool get $i pgp_num; done ================= Working on pool: rbd pg_num: 64 pgp_num: 64 ================= Working on pool: compute pg_num: 512 pgp_num: 512 ================= Working on pool: volumes pg_num: 1024 pgp_num: 1024 ================= Working on pool: images pg_num: 128 pgp_num: 128 root@or1010051251044:~#



Thanks,
Pardhiv Karri

On Tue, May 22, 2018 at 9:16 AM, David Turner <drakonstein@xxxxxxxxx> wrote:
This is all weird. Maybe it just doesn't have any PGs with data on them.  `ceph df`, how many PGs you have in each pool, and which PGs are on osd 38.


On Tue, May 22, 2018, 11:19 AM Pardhiv Karri <meher4india@xxxxxxxxx> wrote:
Hi David,



root@or1010051251044:~# ceph osd tree
ID  WEIGHT   TYPE NAME                    UP/DOWN REWEIGHT PRIMARY-AFFINITY 
 -1 80.00000 root default                                                   
 -2 40.00000     rack rack_A1                                               
 -3 20.00000         host or1010051251040                                   
  0  2.00000             osd.0                 up  1.00000          1.00000 
  1  2.00000             osd.1                 up  1.00000          1.00000 
  2  2.00000             osd.2                 up  1.00000          1.00000 
  3  2.00000             osd.3                 up  1.00000          1.00000 
  4  2.00000             osd.4                 up  1.00000          1.00000 
  5  2.00000             osd.5                 up  1.00000          1.00000 
  6  2.00000             osd.6                 up  1.00000          1.00000 
  7  2.00000             osd.7                 up  1.00000          1.00000 
  8  2.00000             osd.8                 up  1.00000          1.00000 
  9  2.00000             osd.9                 up  1.00000          1.00000 
 -8 20.00000         host or1010051251044                                   
 30  2.00000             osd.30                up  1.00000          1.00000 
 31  2.00000             osd.31                up  1.00000          1.00000 
 32  2.00000             osd.32                up  1.00000          1.00000 
 33  2.00000             osd.33                up  1.00000          1.00000 
 34  2.00000             osd.34                up  1.00000          1.00000 
 35  2.00000             osd.35                up  1.00000          1.00000 
 36  2.00000             osd.36                up  1.00000          1.00000 
 37  2.00000             osd.37                up  1.00000          1.00000 
 38  2.00000             osd.38                up  1.00000          1.00000 
 39  2.00000             osd.39                up  1.00000          1.00000 
 -4 20.00000     rack rack_B1                                               
 -5 20.00000         host or1010051251041                                   
 10  2.00000             osd.10                up  1.00000          1.00000 
 11  2.00000             osd.11                up  1.00000          1.00000 
 12  2.00000             osd.12                up  1.00000          1.00000 
 13  2.00000             osd.13                up  1.00000          1.00000 
 14  2.00000             osd.14                up  1.00000          1.00000 
 15  2.00000             osd.15                up  1.00000          1.00000 
 16  2.00000             osd.16                up  1.00000          1.00000 
 17  2.00000             osd.17                up  1.00000          1.00000 
 18  2.00000             osd.18                up  1.00000          1.00000 
 19  2.00000             osd.19                up  1.00000          1.00000 
 -9        0         host or1010051251045                                   
 -6 20.00000     rack rack_C1                                               
 -7 20.00000         host or1010051251042                                   
 20  2.00000             osd.20                up  1.00000          1.00000 
 21  2.00000             osd.21                up  1.00000          1.00000 
 22  2.00000             osd.22                up  1.00000          1.00000 
 23  2.00000             osd.23                up  1.00000          1.00000 
 24  2.00000             osd.24                up  1.00000          1.00000 
 25  2.00000             osd.25                up  1.00000          1.00000 
 26  2.00000             osd.26                up  1.00000          1.00000 
 27  2.00000             osd.27                up  1.00000          1.00000 
 28  2.00000             osd.28                up  1.00000          1.00000 
 29  2.00000             osd.29                up  1.00000          1.00000 
-10        0         host or1010051251046                                   
-11        0         host or1010051251023                                   
root@or1010051251044:~#





root@or1010051251044:~# ceph -s
    cluster 6eacac66-087a-464d-94cb-9ca2585b98d5
     health HEALTH_OK
            election epoch 144, quorum 0,1,2 or1010051251037,or1010051251038,or1010051251039
     osdmap e1814: 40 osds: 40 up, 40 in
      pgmap v446581: 1728 pgs, 4 pools, 7389 GB data, 1847 kobjects
            22221 GB used, 57472 GB / 79793 GB avail
                1728 active+clean
  client io 61472 kB/s wr, 30 op/s
root@or1010051251044:~#


Thanks,
Pardhiv Karri

On Tue, May 22, 2018 at 5:01 AM, David Turner <drakonstein@xxxxxxxxx> wrote:
What are your `ceph osd tree` and `ceph status` as well?

On Tue, May 22, 2018, 3:05 AM Pardhiv Karri <meher4india@xxxxxxxxx> wrote:
Hi,

We are using Ceph Hammer 0.94.9. Some of our OSDs never get any data or PGs even at their full crush weight, up and running. Rest of the OSDs are at 50% full. Is there a bug in Hammer that is causing this issue? Does upgrading to Jewel or Luminous fix this issue? 

I tried deleting and recreating this OSD N number of times and still the same issue. I am seeing this in 3 of our 4 ceph clusters in different datacenters. We are using HDD as OSD and SSD as Journal drive.

The below is from our lab and OSD 38 is the one that never fills.


ID  WEIGHT   REWEIGHT SIZE   USE    AVAIL  %USE  VAR  TYPE NAME                    
 -1 80.00000        -      0      0      0     0    0 root default                 
 -2 40.00000        - 39812G  6190G 33521G 15.55 0.68     rack rack_A1             
 -3 20.00000        - 19852G  3718G 16134G 18.73 0.82         host or1010051251040 
  0  2.00000  1.00000  1861G   450G  1410G 24.21 1.07             osd.0            
  1  2.00000  1.00000  1999G   325G  1673G 16.29 0.72             osd.1            
  2  2.00000  1.00000  1999G   336G  1662G 16.85 0.74             osd.2            
  3  2.00000  1.00000  1999G   386G  1612G 19.35 0.85             osd.3            
  4  2.00000  1.00000  1999G   385G  1613G 19.30 0.85             osd.4            
  5  2.00000  1.00000  1999G   364G  1634G 18.21 0.80             osd.5            
  6  2.00000  1.00000  1999G   319G  1679G 15.99 0.70             osd.6            
  7  2.00000  1.00000  1999G   434G  1564G 21.73 0.96             osd.7            
  8  2.00000  1.00000  1999G   352G  1646G 17.63 0.78             osd.8            
  9  2.00000  1.00000  1999G   362G  1636G 18.12 0.80             osd.9            
 -8 20.00000        - 19959G  2472G 17387G 12.39 0.55         host or1010051251044 
 30  2.00000  1.00000  1999G   362G  1636G 18.14 0.80             osd.30           
 31  2.00000  1.00000  1999G   293G  1705G 14.66 0.65             osd.31           
 32  2.00000  1.00000  1999G   202G  1796G 10.12 0.45             osd.32           
 33  2.00000  1.00000  1999G   215G  1783G 10.76 0.47             osd.33           
 34  2.00000  1.00000  1999G   192G  1806G  9.61 0.42             osd.34           
 35  2.00000  1.00000  1999G   337G  1661G 16.90 0.74             osd.35           
 36  2.00000  1.00000  1999G   206G  1792G 10.35 0.46             osd.36           
 37  2.00000  1.00000  1999G   266G  1732G 13.33 0.59             osd.37           
 38  2.00000  1.00000  1999G 55836k  1998G  0.00    0             osd.38           
 39  2.00000  1.00000  1968G   396G  1472G 20.12 0.89             osd.39           
 -4 20.00000        -      0      0      0     0    0     rack rack_B1             
 -5 20.00000        - 19990G  5978G 14011G 29.91 1.32         host or1010051251041 
 10  2.00000  1.00000  1999G   605G  1393G 30.27 1.33             osd.10           
 11  2.00000  1.00000  1999G   592G  1406G 29.62 1.30             osd.11           
 12  2.00000  1.00000  1999G   539G  1460G 26.96 1.19             osd.12           
 13  2.00000  1.00000  1999G   684G  1314G 34.22 1.51             osd.13           
 14  2.00000  1.00000  1999G   510G  1488G 25.56 1.13             osd.14           
 15  2.00000  1.00000  1999G   590G  1408G 29.52 1.30             osd.15           
 16  2.00000  1.00000  1999G   595G  1403G 29.80 1.31             osd.16           
 17  2.00000  1.00000  1999G   652G  1346G 32.64 1.44             osd.17           
 18  2.00000  1.00000  1999G   544G  1454G 27.23 1.20             osd.18           
 19  2.00000  1.00000  1999G   665G  1333G 33.27 1.46             osd.19           
 -9        0        -      0      0      0     0    0         host or1010051251045 
 -6 20.00000        -      0      0      0     0    0     rack rack_C1             
 -7 20.00000        - 19990G  5956G 14033G 29.80 1.31         host or1010051251042 
 20  2.00000  1.00000  1999G   701G  1297G 35.11 1.55             osd.20           
 21  2.00000  1.00000  1999G   573G  1425G 28.70 1.26             osd.21           
 22  2.00000  1.00000  1999G   652G  1346G 32.64 1.44             osd.22           
 23  2.00000  1.00000  1999G   612G  1386G 30.62 1.35             osd.23           
 24  2.00000  1.00000  1999G   614G  1384G 30.74 1.35             osd.24           
 25  2.00000  1.00000  1999G   561G  1437G 28.11 1.24             osd.25           
 26  2.00000  1.00000  1999G   558G  1440G 27.93 1.23             osd.26           
 27  2.00000  1.00000  1999G   610G  1388G 30.52 1.34             osd.27           
 28  2.00000  1.00000  1999G   515G  1483G 25.81 1.14             osd.28           
 29  2.00000  1.00000  1999G   555G  1443G 27.78 1.22             osd.29           
-10        0        -      0      0      0     0    0         host or1010051251046 
-11        0        -      0      0      0     0    0         host or1010051251023 
                TOTAL 79793G 18126G 61566G 22.72                                   
MIN/MAX VAR: 0/1.55  STDDEV: 8.26


Thanks
Pardhiv karri


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Pardhiv Karri
"Rise and Rise again until LAMBS become LIONS" 





--
Pardhiv Karri
"Rise and Rise again until LAMBS become LIONS" 





--
Pardhiv Karri
"Rise and Rise again until LAMBS become LIONS" 





--
Pardhiv Karri
"Rise and Rise again until LAMBS become LIONS" 





--
Pardhiv Karri
"Rise and Rise again until LAMBS become LIONS" 


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux