Re: odd osd id in ceph health

Brady Deetz <bdeetz@xxxxxxxxx> · Wed, 24 Oct 2018 17:14:18 -0500

There are 16 hosts in the root associated with that ec rule.
[ceph-admin@admin libr-cluster]$ ceph osd lspools
1 cephfs_data,2 cephfs_metadata,35 vmware_rep,36 rbd,38 one,44 nvme,48 iscsi-primary,49 iscsi-secondary,50 it_share,55 vmware_ssd,56 vmware_ssd_metadata,57 vmware_ssd_2_1,

[ceph-admin@admin libr-cluster]$ ceph osd tree
.........................
 -75        261.88696 root nonscientific                                   
 -90         16.36794     host osd0-nonscientific                          
  73   hdd    5.45598         osd.73                   up  1.00000 1.00000 
 108   hdd    5.45598         osd.108                  up  1.00000 1.00000 
 130   hdd    5.45598         osd.130                  up  1.00000 1.00000 
 -93         16.36794     host osd1-nonscientific                          
 131   hdd    5.45598         osd.131                  up  1.00000 1.00000 
 132   hdd    5.45598         osd.132                  up  1.00000 1.00000 
 154   hdd    5.45598         osd.154                  up  1.00000 1.00000 
-108         16.36794     host osd10-nonscientific                         
 258   hdd    5.45598         osd.258                  up  1.00000 1.00000 
 274   hdd    5.45598         osd.274                  up  1.00000 1.00000 
 275   hdd    5.45598         osd.275                  up  1.00000 1.00000 
 -76         16.36794     host osd11-nonscientific                         
   1   hdd    5.45598         osd.1                    up  1.00000 1.00000 
   2   hdd    5.45598         osd.2                    up  1.00000 1.00000 
 281   hdd    5.45598         osd.281                  up  1.00000 1.00000 
-111         16.36794     host osd12-nonscientific                         
 307   hdd    5.45598         osd.307                  up  1.00000 1.00000 
 308   hdd    5.45598         osd.308                  up  1.00000 1.00000 
 330   hdd    5.45598         osd.330                  up  1.00000 1.00000 
-102         16.36794     host osd13-nonscientific                         
 215   hdd    5.45598         osd.215                  up  1.00000 1.00000 
 216   hdd    5.45598         osd.216                  up  1.00000 1.00000 
 238   hdd    5.45598         osd.238                  up  1.00000 1.00000 
-105         16.36794     host osd14-nonscientific                         
 239   hdd    5.45598         osd.239                  up  1.00000 1.00000 
 286   hdd    5.45598         osd.286                  up  1.00000 1.00000 
 306   hdd    5.45598         osd.306                  up  1.00000 1.00000 
-150         16.36794     host osd15-nonscientific                         
 282   hdd    5.45598         osd.282                  up  1.00000 1.00000 
 283   hdd    5.45598         osd.283                  up  1.00000 1.00000 
 376   hdd    5.45598         osd.376                  up  1.00000 1.00000 
-151         16.36794     host osd16-nonscientific                         
 377   hdd    5.45598         osd.377                  up  1.00000 1.00000 
 378   hdd    5.45598         osd.378                  up  1.00000 1.00000 
 400   hdd    5.45598         osd.400                  up  1.00000 1.00000 
 -81         16.36794     host osd2-nonscientific                          
  24   hdd    5.45598         osd.24                   up  1.00000 1.00000 
  25   hdd    5.45598         osd.25                   up  1.00000 1.00000 
  47   hdd    5.45598         osd.47                   up  1.00000 1.00000 
 -96         16.36794     host osd4-nonscientific                          
 168   hdd    5.45598         osd.168                  up  1.00000 1.00000 
 169   hdd    5.45598         osd.169                  up  1.00000 1.00000 
 191   hdd    5.45598         osd.191                  up  1.00000 1.00000 
-169         16.36794     host osd5-nonscientific                          
  48   hdd    5.45598         osd.48                   up  1.00000 1.00000 
  49   hdd    5.45598         osd.49                   up  1.00000 1.00000 
  71   hdd    5.45598         osd.71                   up  1.00000 1.00000 
 -87         16.36794     host osd6-nonscientific                          
  98   hdd    5.45598         osd.98                   up  1.00000 1.00000 
  99   hdd    5.45598         osd.99                   up  1.00000 1.00000 
 167   hdd    5.45598         osd.167                  up  1.00000 1.00000 
 -84         16.36794     host osd7-nonscientific                          
  72   hdd    5.45598         osd.72                   up  1.00000 1.00000 
  75   hdd    5.45598         osd.75                   up  1.00000 1.00000 
  97   hdd    5.45598         osd.97                   up  1.00000 1.00000 
 -99         16.36794     host osd8-nonscientific                          
 184   hdd    5.45598         osd.184                  up  0.85004 1.00000 
 192   hdd    5.45598         osd.192                  up  1.00000 1.00000 
 214   hdd    5.45598         osd.214                down        0 1.00000 
-257         16.36794     host osd9-nonscientific                          
 425   hdd    5.45598         osd.425                  up  1.00000 1.00000 
 426   hdd    5.45598         osd.426                  up  1.00000 1.00000 
 427   hdd    5.45598         osd.427                  up  1.00000 1.00000 
.........................

On Wed, Oct 24, 2018 at 2:31 PM Serkan Çoban <cobanserkan@xxxxxxxxx> wrote:
I think you dont have enough hosts for your ec pool crush rule.

if your failure domain is host, then you need at least ten hosts.

On Wed, Oct 24, 2018 at 9:39 PM Brady Deetz <bdeetz@xxxxxxxxx> wrote:

>

> My cluster (v12.2.8) is currently recovering and I noticed this odd OSD ID in ceph health detail:

> "2147483647"

>

> [ceph-admin@admin libr-cluster]$ ceph health detail | grep 2147483647

>     pg 50.c3 is stuck undersized for 148638.689866, current state active+recovery_wait+undersized+degraded+remapped, last acting [275,282,330,25,154,98,239,2147483647,75,49]

>     pg 50.d4 is stuck undersized for 148638.649657, current state active+recovery_wait+undersized+degraded+remapped, last acting [239,275,307,49,184,25,281,2147483647,283,378]

>     pg 50.10b is stuck undersized for 148638.666901, current state active+undersized+degraded+remapped+backfill_wait, last acting [131,192,283,308,169,258,2147483647,75,306,25]

>     pg 50.110 is stuck undersized for 148638.684818, current state active+recovery_wait+undersized+degraded+remapped, last acting [169,377,2147483647,2,274,47,306,192,131,283]

>     pg 50.116 is stuck undersized for 148638.703043, current state active+recovery_wait+undersized+degraded+remapped, last acting [99,283,168,47,71,400,2147483647,108,239,2]

>     pg 50.121 is stuck undersized for 148638.700838, current state active+undersized+degraded+remapped+backfill_wait, last acting [71,2,75,307,286,73,168,2147483647,376,25]

>     pg 50.12a is stuck undersized for 145362.808035, current state active+undersized+degraded+remapped+backfill_wait, last acting [71,378,169,2147483647,192,308,131,108,239,97]

>

>

> [ceph-admin@admin libr-cluster]$ ceph osd metadata 2147483647

> Error ENOENT: osd.2147483647 does not exist

>

> Is this expected? If not, what should I do?

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com