Re: ceph infernalis pg creating forever

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Here's the actual crush map:

$ cat /home/ceph/actual_map.out

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
device 12 osd.12
device 13 osd.13
device 14 osd.14
device 15 osd.15
device 16 osd.16
device 17 osd.17
device 18 osd.18
device 19 osd.19
device 20 osd.20
device 21 osd.21
device 22 osd.22
device 23 osd.23
device 24 osd.24
device 25 osd.25
device 26 osd.26
device 27 osd.27
device 28 osd.28
device 29 osd.29
device 30 osd.30
device 31 osd.31

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host cibn05 {
    id -2        # do not change unnecessarily
    # weight 5.780
    alg straw
    hash 0    # rjenkins1
    item osd.0 weight 0.722
    item osd.1 weight 0.722
    item osd.2 weight 0.722
    item osd.3 weight 0.722
    item osd.4 weight 0.722
    item osd.5 weight 0.722
    item osd.6 weight 0.722
    item osd.7 weight 0.722
}
host cibn06 {
    id -3        # do not change unnecessarily
    # weight 5.780
    alg straw
    hash 0    # rjenkins1
    item osd.8 weight 0.722
    item osd.9 weight 0.722
    item osd.10 weight 0.722
    item osd.11 weight 0.722
    item osd.12 weight 0.722
    item osd.13 weight 0.722
    item osd.14 weight 0.722
    item osd.15 weight 0.722
}
host cibn07 {
    id -4        # do not change unnecessarily
    # weight 5.780
    alg straw
    hash 0    # rjenkins1
    item osd.16 weight 0.722
    item osd.17 weight 0.722
    item osd.18 weight 0.722
    item osd.19 weight 0.722
    item osd.20 weight 0.722
    item osd.21 weight 0.722
    item osd.22 weight 0.722
    item osd.23 weight 0.722
}
host cibn08 {
    id -5        # do not change unnecessarily
    # weight 5.780
    alg straw
    hash 0    # rjenkins1
    item osd.24 weight 0.722
    item osd.25 weight 0.722
    item osd.26 weight 0.722
    item osd.27 weight 0.722
    item osd.28 weight 0.722
    item osd.29 weight 0.722
    item osd.30 weight 0.722
    item osd.31 weight 0.722
}
root default {
    id -1        # do not change unnecessarily
    # weight 23.120
    alg straw
    hash 0    # rjenkins1
    item cibn05 weight 5.780
    item cibn06 weight 5.780
    item cibn07 weight 5.780
    item cibn08 weight 5.780
}

# rules
rule replicated_ruleset {
    ruleset 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}

# end crush map

Kernel version of the servers mon and osd is 3.19.0-25-generic


Regards,


German

2015-11-20 12:56 GMT-03:00 Gregory Farnum <gfarnum@xxxxxxxxxx>:
This usually means your crush mapping for the pool in question is unsatisfiable. Check what the rule is doing.
-Greg


On Friday, November 20, 2015, German Anders <ganders@xxxxxxxxxxxx> wrote:
Hi all, I've finished the install of a new ceph cluster with infernalis 9.2.0 release. But I'm getting the following error msg:

$ ceph -w
    cluster 29xxxxxx-3xxx-xxx9-xxx7-xxxxxxxxb8xx
     health HEALTH_WARN
            64 pgs degraded
            64 pgs stale
            64 pgs stuck degraded
            1024 pgs stuck inactive
            64 pgs stuck stale
            1024 pgs stuck unclean
            64 pgs stuck undersized
            64 pgs undersized
            pool rbd pg_num 1024 > pgp_num 64
     monmap e1: 3 mons at {cibm01=172.23.16.1:6789/0,cibm02=172.23.16.2:6789/0,cibm03=172.23.16.3:6789/0}
            election epoch 6, quorum 0,1,2 cibm01,cibm02,cibm03
     osdmap e113: 32 osds: 32 up, 32 in
            flags sortbitwise
      pgmap v1264: 1024 pgs, 1 pools, 0 bytes data, 0 objects
            1344 MB used, 23673 GB / 23675 GB avail
                 960 creating
                  64 stale+undersized+degraded+peered

2015-11-20 10:22:27.947850 mon.0 [INF] pgmap v1264: 1024 pgs: 960 creating, 64 stale+undersized+degraded+peered; 0 bytes data, 1344 MB used, 23673 GB / 23675 GB avail


It seems like it's 'creating' the new pgs... but, I had been in this state for a while, with actually no changes at all. Also the ceph health detail output:

$ ceph health
HEALTH_WARN 64 pgs degraded; 64 pgs stale; 64 pgs stuck degraded; 1024 pgs stuck inactive; 64 pgs stuck stale; 1024 pgs stuck unclean; 64 pgs stuck undersized; 64 pgs undersized; pool rbd pg_num 1024 > pgp_num 64

$ ceph health detail
(...)
pg 0.31 is stuck stale for 85287.493086, current state stale+undersized+degraded+peered, last acting [0]
pg 0.32 is stuck stale for 85287.493090, current state stale+undersized+degraded+peered, last acting [0]
pg 0.33 is stuck stale for 85287.493093, current state stale+undersized+degraded+peered, last acting [0]
pg 0.34 is stuck stale for 85287.493097, current state stale+undersized+degraded+peered, last acting [0]
pg 0.35 is stuck stale for 85287.493101, current state stale+undersized+degraded+peered, last acting [0]
pg 0.36 is stuck stale for 85287.493105, current state stale+undersized+degraded+peered, last acting [0]
pg 0.37 is stuck stale for 85287.493110, current state stale+undersized+degraded+peered, last acting [0]
pg 0.38 is stuck stale for 85287.493114, current state stale+undersized+degraded+peered, last acting [0]
pg 0.39 is stuck stale for 85287.493119, current state stale+undersized+degraded+peered, last acting [0]
pg 0.3a is stuck stale for 85287.493123, current state stale+undersized+degraded+peered, last acting [0]
pg 0.3b is stuck stale for 85287.493127, current state stale+undersized+degraded+peered, last acting [0]
pg 0.3c is stuck stale for 85287.493131, current state stale+undersized+degraded+peered, last acting [0]
pg 0.3d is stuck stale for 85287.493135, current state stale+undersized+degraded+peered, last acting [0]
pg 0.3e is stuck stale for 85287.493139, current state stale+undersized+degraded+peered, last acting [0]
pg 0.3f is stuck stale for 85287.493149, current state stale+undersized+degraded+peered, last acting [0]
pool rbd pg_num 1024 > pgp_num 64

If I try to increment the pgp_num it will said:

$ ceph osd pool set rbd pgp_num 1024
Error EBUSY: currently creating pgs, wait

$ ceph osd lspools
0 rbd,

$ ceph osd tree
ID WEIGHT   TYPE NAME       UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 23.11963 root default                                     
-2  5.77991     host cibn05                                  
 0  0.72249         osd.0        up  1.00000          1.00000
 1  0.72249         osd.1        up  1.00000          1.00000
 2  0.72249         osd.2        up  1.00000          1.00000
 3  0.72249         osd.3        up  1.00000          1.00000
 4  0.72249         osd.4        up  1.00000          1.00000
 5  0.72249         osd.5        up  1.00000          1.00000
 6  0.72249         osd.6        up  1.00000          1.00000
 7  0.72249         osd.7        up  1.00000          1.00000
-3  5.77991     host cibn06                                  
 8  0.72249         osd.8        up  1.00000          1.00000
 9  0.72249         osd.9        up  1.00000          1.00000
10  0.72249         osd.10       up  1.00000          1.00000
11  0.72249         osd.11       up  1.00000          1.00000
12  0.72249         osd.12       up  1.00000          1.00000
13  0.72249         osd.13       up  1.00000          1.00000
14  0.72249         osd.14       up  1.00000          1.00000
15  0.72249         osd.15       up  1.00000          1.00000
-4  5.77991     host cibn07                                  
16  0.72249         osd.16       up  1.00000          1.00000
17  0.72249         osd.17       up  1.00000          1.00000
18  0.72249         osd.18       up  1.00000          1.00000
19  0.72249         osd.19       up  1.00000          1.00000
20  0.72249         osd.20       up  1.00000          1.00000
21  0.72249         osd.21       up  1.00000          1.00000
22  0.72249         osd.22       up  1.00000          1.00000
23  0.72249         osd.23       up  1.00000          1.00000
-5  5.77991     host cibn08                                  
24  0.72249         osd.24       up  1.00000          1.00000
25  0.72249         osd.25       up  1.00000          1.00000
26  0.72249         osd.26       up  1.00000          1.00000
27  0.72249         osd.27       up  1.00000          1.00000
28  0.72249         osd.28       up  1.00000          1.00000
29  0.72249         osd.29       up  1.00000          1.00000
30  0.72249         osd.30       up  1.00000          1.00000
31  0.72249         osd.31       up  1.00000          1.00000


$ ceph pg dump
(...)
pool 0    0    0    0    0    0    0    0    0
 sum    0    0    0    0    0    0    0    0
osdstat    kbused    kbavail    kb    hb in    hb out
31    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,30]    []
30    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,29]    []
29    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,28]    []
28    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,27]    []
27    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,26]    []
26    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,25]    []
25    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,24]    []
24    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,23]    []
23    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,22]    []
22    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,21]    []
9    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8]    []
8    36896    775752376    775789272    [0,1,2,3,4,5,6,7]    []
7    36896    775752376    775789272    [0,3,5,6]    []
6    36896    775752376    775789272    [0,3,5,7]    []
5    36896    775752376    775789272    [0,3,6,7]    []
4    36896    775752376    775789272    [0,1,2,3,5,6,7]    []
3    36896    775752376    775789272    [0,5,6,7]    []
2    36896    775752376    775789272    [0,1,3,4,5,6,7]    []
1    36896    775752376    775789272    [0,2,3,4,5,6,7]    []
0    36900    775752372    775789272    [3,5,6,7]    []
10    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,9]    []
11    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,10]    []
12    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,11]    []
13    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,12]    []
14    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,13]    []
15    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,14]    []
16    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,15]    []
17    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,16]    []
18    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,17]    []
19    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,18]    []
20    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,19]    []
21    36896    775752376    775789272    [0,1,2,3,4,5,6,7,8,20]    []
 sum    1180676    24824076028    24825256704



Any ideas?

Thanks in advance,

German

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux