It seams that you played around with crushmap, and done something wrong.
Compare the look of 'ceph osd tree' and crushmap. There are some 'osd' devices renamed to 'device' think threre is you problem.
Отправлено с мобильного устройства.
-----Original Message-----
From: Vasiliy Angapov <angapov@xxxxxxxxx>
To: ceph-users <ceph-users@xxxxxxxxxxxxxx>
Sent: чт, 26 нояб. 2015 7:53
Subject: [ceph-users] Undersized pgs problem
Hi, colleagues!
Compare the look of 'ceph osd tree' and crushmap. There are some 'osd' devices renamed to 'device' think threre is you problem.
Отправлено с мобильного устройства.
-----Original Message-----
From: Vasiliy Angapov <angapov@xxxxxxxxx>
To: ceph-users <ceph-users@xxxxxxxxxxxxxx>
Sent: чт, 26 нояб. 2015 7:53
Subject: [ceph-users] Undersized pgs problem
I have small 4-node CEPH cluster (0.94.2), all pools have size 3, min_size 1.
This night one host failed and cluster was unable to rebalance saying
there are a lot of undersized pgs.
root@slpeah002:[~]:# ceph -s
cluster 78eef61a-3e9c-447c-a3ec-ce84c617d728
health HEALTH_WARN
1486 pgs degraded
1486 pgs stuck degraded
2257 pgs stuck unclean
1486 pgs stuck undersized
1486 pgs undersized
recovery 80429/555185 objects degraded (14.487%)
recovery 40079/555185 objects misplaced (7.219%)
4/20 in osds are down
1 mons down, quorum 1,2 slpeah002,slpeah007
monmap e7: 3 mons at
{slpeah001=192.168.254.11:6780/0,slpeah002=192.168.254.12:6780/0,slpeah007=172.31.252.46:6789/0}
election epoch 710, quorum 1,2 slpeah002,slpeah007
osdmap e14062: 20 osds: 16 up, 20 in; 771 remapped pgs
pgmap v7021316: 4160 pgs, 5 pools, 1045 GB data, 180 kobjects
3366 GB used, 93471 GB / 96838 GB avail
80429/555185 objects degraded (14.487%)
40079/555185 objects misplaced (7.219%)
1903 active+clean
1486 active+undersized+degraded
771 active+remapped
client io 0 B/s rd, 246 kB/s wr, 67 op/s
root@slpeah002:[~]:# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 94.63998 root default
-9 32.75999 host slpeah007
72 5.45999 osd.72 up 1.00000 1.00000
73 5.45999 osd.73 up 1.00000 1.00000
74 5.45999 osd.74 up 1.00000 1.00000
75 5.45999 osd.75 up 1.00000 1.00000
76 5.45999 osd.76 up 1.00000 1.00000
77 5.45999 osd.77 up 1.00000 1.00000
-10 32.75999 host slpeah008
78 5.45999 osd.78 up 1.00000 1.00000
79 5.45999 osd.79 up 1.00000 1.00000
80 5.45999 osd.80 up 1.00000 1.00000
81 5.45999 osd.81 up 1.00000 1.00000
82 5.45999 osd.82 up 1.00000 1.00000
83 5.45999 osd.83 up 1.00000 1.00000
-3 14.56000 host slpeah001
1 3.64000 osd.1 down 1.00000 1.00000
33 3.64000 osd.33 down 1.00000 1.00000
34 3.64000 osd.34 down 1.00000 1.00000
35 3.64000 osd.35 down 1.00000 1.00000
-2 14.56000 host slpeah002
0 3.64000 osd.0 up 1.00000 1.00000
36 3.64000 osd.36 up 1.00000 1.00000
37 3.64000 osd.37 up 1.00000 1.00000
38 3.64000 osd.38 up 1.00000 1.00000
Crushmap:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0
device 1 osd.1
device 2 device2
device 3 device3
device 4 device4
device 5 device5
device 6 device6
device 7 device7
device 8 device8
device 9 device9
device 10 device10
device 11 device11
device 12 device12
device 13 device13
device 14 device14
device 15 device15
device 16 device16
device 17 device17
device 18 device18
device 19 device19
device 20 device20
device 21 device21
device 22 device22
device 23 device23
device 24 device24
device 25 device25
device 26 device26
device 27 device27
device 28 device28
device 29 device29
device 30 device30
device 31 device31
device 32 device32
device 33 osd.33
device 34 osd.34
device 35 osd.35
device 36 osd.36
device 37 osd.37
device 38 osd.38
device 39 device39
device 40 device40
device 41 device41
device 42 device42
device 43 device43
device 44 device44
device 45 device45
device 46 device46
device 47 device47
device 48 device48
device 49 device49
device 50 device50
device 51 device51
device 52 device52
device 53 device53
device 54 device54
device 55 device55
device 56 device56
device 57 device57
device 58 device58
device 59 device59
device 60 device60
device 61 device61
device 62 device62
device 63 device63
device 64 device64
device 65 device65
device 66 device66
device 67 device67
device 68 device68
device 69 device69
device 70 device70
device 71 device71
device 72 osd.72
device 73 osd.73
device 74 osd.74
device 75 osd.75
device 76 osd.76
device 77 osd.77
device 78 osd.78
device 79 osd.79
device 80 osd.80
device 81 osd.81
device 82 osd.82
device 83 osd.83
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
# buckets
host slpeah007 {
id -9 # do not change unnecessarily
# weight 32.760
alg straw
hash 0 # rjenkins1
item osd.72 weight 5.460
item osd.73 weight 5.460
item osd.74 weight 5.460
item osd.75 weight 5.460
item osd.76 weight 5.460
item osd.77 weight 5.460
}
host slpeah008 {
id -10 # do not change unnecessarily
# weight 32.760
alg straw
hash 0 # rjenkins1
item osd.78 weight 5.460
item osd.79 weight 5.460
item osd.80 weight 5.460
item osd.81 weight 5.460
item osd.82 weight 5.460
item osd.83 weight 5.460
}
host slpeah001 {
id -3 # do not change unnecessarily
# weight 14.560
alg straw
hash 0 # rjenkins1
item osd.1 weight 3.640
item osd.33 weight 3.640
item osd.34 weight 3.640
item osd.35 weight 3.640
}
host slpeah002 {
id -2 # do not change unnecessarily
# weight 14.560
alg straw
hash 0 # rjenkins1
item osd.0 weight 3.640
item osd.36 weight 3.640
item osd.37 weight 3.640
item osd.38 weight 3.640
}
root default {
id -1 # do not change unnecessarily
# weight 94.640
alg straw
hash 0 # rjenkins1
item slpeah007 weight 32.760
item slpeah008 weight 32.760
item slpeah001 weight 14.560
item slpeah002 weight 14.560
}
# rules
rule default {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
# end crush map
This is odd because pools have size 3 and I have 3 hosts alive, so why
it is saying that undersized pgs are present? It makes me feel like
CRUSH is not working properly.
There is not much data currently in cluster, something about 3TB and
as you can see from osd tree - each host have minimum of 14TB disk
space on OSDs.
So I'm a bit stuck now...
How can I find the source of trouble?
Thanks in advance!
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com