Hi Ceph Admins,
This night our ceph cluster got all pools 100% full. This happend after osd.56 (95% used) reached OSD_FULL state.
ceph versions 12.2.2
Logs
2018-03-03 17:15:22.560710 mon.cephnode01 mon.0 10.212.32.18:6789/0 5224452 : cluster [ERR] overall HEALTH_ERR noscrub,nodeep-scrub flag(s) set; 1 backfillfull osd(s); 5 nearfull osd(s); 21 pool(s) backfillfull; 638551/287271738 objects misplaced (0.222%); Degraded data redundancy: 253066/287271738 objects degraded (0.088%), 25 pgs unclean; Degraded data redundancy (low space): 25 pgs backfill_toofull
2018-03-03 17:15:42.513194 mon.cephnode01 mon.0 10.212.32.18:6789/0 5224515 : cluster [WRN] Health check update: 638576/287284518 objects misplaced (0.222%) (OBJECT_MISPLACED)
2018-03-03 17:15:42.513256 mon.cephnode01 mon.0 10.212.32.18:6789/0 5224516 : cluster [WRN] Health check update: Degraded data redundancy: 253266/287284518 objects degraded (0.088%), 25 pgs unclean (PG_DEGRADED)
2018-03-03 17:15:44.684928 mon.cephnode01 mon.0 10.212.32.18:6789/0 5224524 : cluster [ERR] Health check failed: 1 full osd(s) (OSD_FULL)
2018-03-03 17:15:44.684969 mon.cephnode01 mon.0 10.212.32.18:6789/0 5224525 : cluster [WRN] Health check failed: 21 pool(s) full (POOL_FULL)
2018-03-03 17:15:44.684987 mon.cephnode01 mon.0 10.212.32.18:6789/0 5224526 : cluster [INF] Health check cleared: OSD_BACKFILLFULL (was: 1 backfillfull osd(s))
2018-03-03 17:15:44.685013 mon.cephnode01 mon.0 10.212.32.18:6789/0 5224527 : cluster [INF] Health check cleared: POOL_BACKFILLFULL (was: 21 pool(s) backfillfull)
# ceph df detail from crush time
GLOBAL:
SIZE AVAIL RAW USED %RAW USED OBJECTS
381T 102T 278T 73.05 38035k
POOLS:
NAME ID QUOTA OBJECTS QUOTA BYTES USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED
rbd 0 N/A N/A 0 0 0 0 0 1 134k 0
vms 1 N/A N/A 0 0 0 0 0 0 0 0
images 2 N/A N/A 7659M 100.00 0 1022 1022 110k 5668 22977M
volumes 3 N/A N/A 40991G 100.00 0 10514980 10268k 3404M 4087M 120T
.rgw.root 4 N/A N/A 1588 100.00 0 4 4 402k 4 4764
default.rgw.control 5 N/A N/A 0 0 0 8 8 0 0 0
default.rgw.data.root 6 N/A N/A 94942 100.00 0 339 339 257k 6422 278k
default.rgw.gc 7 N/A N/A 0 0 0 32 32 3125M 7410k 0
default.rgw.log 8 N/A N/A 0 0 0 186 186 27222k 18146k 0
default.rgw.users.uid 9 N/A N/A 4252 100.00 0 17 17 262k 64561 12756
default.rgw.usage 10 N/A N/A 0 0 0 8 8 332k 665k 0
default.rgw.users.email 11 N/A N/A 87 100.00 0 4 4 0 4 261
default.rgw.users.keys 12 N/A N/A 206 100.00 0 11 11 459 23 618
default.rgw.users.swift 13 N/A N/A 40 100.00 0 3 3 0 3 120
default.rgw.buckets.index 14 N/A N/A 0 0 0 210 210 321M 41709k 0
default.rgw.buckets.non-ec 16 N/A N/A 0 0 0 114 114 18006 12055 0
default.rgw.buckets.extra 17 N/A N/A 0 0 0 0 0 0 0 0
.rgw.buckets.extra 18 N/A N/A 0 0 0 0 0 0 0 0
default.rgw.buckets.data 20 N/A N/A 104T 100.00 0 28334451 27670k 160M 156M 156T
benchmark_replicated 21 N/A N/A 87136M 100.00 0 21792 21792 1450k 4497k 255G
benchmark_erasure_coded 22 N/A N/A 292G 100.00 0 74779 74779 61288 680k 439G
#
What we did to reclaim some space is:
- deleted two benchmark pools
- reweight full osd.56 from 1.0 to 0.85
- added new node - cephnode10 (cluster has grown from 9 to 10 nodes but I had to do crush reweight down to 0 on new OSDs as a lot of slow requestes (like 3000+) occured and customer IOPS went totally down. Adding one OSD at a time now)
Current status
# ceph -s
cluster:
id: 1023c49f-3a10-42de-9f62-9b122db32f1f
health: HEALTH_ERR
noscrub,nodeep-scrub flag(s) set
5 nearfull osd(s)
19 pool(s) nearfull
16151257/286563963 objects misplaced (5.636%)
Degraded data redundancy: 20949/286563963 objects degraded (0.007%), 431 pgs unclean, 28 pgs degraded, 1 pg undersized
Degraded data redundancy (low space): 15 pgs backfill_toofull
services:
mon: 3 daemons, quorum cephnode01,cephnode02,cephnode03
mgr: cephnode02(active), standbys: cephnode03, cephnode01
osd: 120 osds: 117 up, 117 in; 405 remapped pgs
flags noscrub,nodeep-scrub
rgw: 3 daemons active
data:
pools: 19 pools, 3760 pgs
objects: 37941k objects, 144 TB
usage: 278 TB used, 146 TB / 425 TB avail
pgs: 20949/286563963 objects degraded (0.007%)
16151257/286563963 objects misplaced (5.636%)
3329 active+clean
370 active+remapped+backfill_wait
26 active+recovery_wait+degraded
18 active+remapped+backfilling
15 active+remapped+backfill_wait+backfill_toofull
1 active+recovery_wait+degraded+remapped
1 active+undersized+degraded+remapped+backfilling
io:
client: 18337 B/s rd, 29269 kB/s wr, 1 op/s rd, 234 op/s wr
recovery: 946 MB/s, 243 objects/s
#
# ceph df detail
GLOBAL:
SIZE AVAIL RAW USED %RAW USED OBJECTS
425T 146T 278T 65.50 37941k
POOLS:
NAME ID QUOTA OBJECTS QUOTA BYTES USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED
rbd 0 N/A N/A 0 0 7415G 0 0 1 134k 0
vms 1 N/A N/A 0 0 7415G 0 0 0 0 0
images 2 N/A N/A 7659M 0.10 7415G 1022 1022 110k 5668 22445M
volumes 3 N/A N/A 40992G 84.68 7415G 10515231 10268k 3416M 4090M 120T
.rgw.root 4 N/A N/A 1588 0 7415G 4 4 141k 4 4764
default.rgw.control 5 N/A N/A 0 0 7415G 8 8 0 0 0
default.rgw.data.root 6 N/A N/A 94942 0 7415G 339 339 257k 6422 278k
default.rgw.gc 7 N/A N/A 0 0 7415G 32 32 3125M 7430k 0
default.rgw.log 8 N/A N/A 0 0 7415G 186 186 27249k 18164k 0
default.rgw.users.uid 9 N/A N/A 4252 0 7415G 17 17 263k 64577 12756
default.rgw.usage 10 N/A N/A 0 0 7415G 8 8 332k 665k 0
default.rgw.users.email 11 N/A N/A 87 0 7415G 4 4 0 4 261
default.rgw.users.keys 12 N/A N/A 206 0 7415G 11 11 483 23 580
default.rgw.users.swift 13 N/A N/A 40 0 7415G 3 3 0 3 120
default.rgw.buckets.index 14 N/A N/A 0 0 7415G 210 210 321M 41709k 0
default.rgw.buckets.non-ec 16 N/A N/A 0 0 7415G 114 114 18006 12055 0
default.rgw.buckets.extra 17 N/A N/A 0 0 7415G 0 0 0 0 0
.rgw.buckets.extra 18 N/A N/A 0 0 7415G 0 0 0 0 0
default.rgw.buckets.data 20 N/A N/A 104T 87.85 14831G 28334711 27670k 160M 156M 157T
#
Most utilized pools are: volumes (replicated pool) and default.rgw.buckets.data (EC pool, k=6,m=3)
pool 3 'volumes' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 10047 flags hashpspool,backfillfull stripe_width 0 application rbd
removed_snaps [1~3]
pool 20 'default.rgw.buckets.data' erasure size 9 min_size 6 crush_rule 1 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 10047 flags hashpspool,backfillfull stripe_width 4224 application rgw
Crush rules for above pools:
# rules
rule replicated_ruleset {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type rack # !!! rack as failure domain
step emit
}
rule ec_rule_k6_m3 {
id 1
type erasure
min_size 3
max_size 9
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step chooseleaf indep 0 type host # !!! host as failure domain
step emit
}
And finally cluster topology
# ceph osd df tree
ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS TYPE NAME
-1 392.72797 - 425T 278T 146T 65.51 1.00 - root default
-6 392.72797 - 425T 278T 146T 65.51 1.00 - region region01
-5 392.72797 - 425T 278T 146T 65.51 1.00 - datacenter dc01
-4 392.72797 - 425T 278T 146T 65.51 1.00 - room room01
-8 43.63699 - 44684G 31703G 12980G 70.95 1.08 - rack rack01
-7 43.63699 - 44684G 31703G 12980G 70.95 1.08 - host cephnode01
0 hdd 3.63599 1.00000 3723G 2957G 765G 79.43 1.21 178 osd.0
2 hdd 3.63599 1.00000 3723G 2407G 1315G 64.66 0.99 157 osd.2
4 hdd 3.63599 1.00000 3723G 2980G 742G 80.05 1.22 184 osd.4
6 hdd 3.63599 1.00000 3723G 2768G 955G 74.34 1.13 170 osd.6
8 hdd 3.63599 1.00000 3723G 2704G 1019G 72.62 1.11 172 osd.8
11 hdd 3.63599 1.00000 3723G 2899G 824G 77.87 1.19 181 osd.11
12 hdd 3.63599 1.00000 3723G 2788G 935G 74.89 1.14 183 osd.12
14 hdd 3.63599 1.00000 3723G 2139G 1584G 57.44 0.88 139 osd.14
16 hdd 3.63599 1.00000 3723G 2672G 1050G 71.78 1.10 174 osd.16
18 hdd 3.63599 1.00000 3723G 2575G 1148G 69.17 1.06 166 osd.18
20 hdd 3.63599 1.00000 3723G 2395G 1328G 64.33 0.98 149 osd.20
22 hdd 3.63599 1.00000 3723G 2414G 1309G 64.83 0.99 161 osd.22
-3 43.63699 - 44684G 32329G 12354G 72.35 1.10 - rack rack02
-2 43.63699 - 44684G 32329G 12354G 72.35 1.10 - host cephnode02
1 hdd 3.63599 1.00000 3723G 2874G 848G 77.21 1.18 172 osd.1
3 hdd 3.63599 1.00000 3723G 3287G 436G 88.27 1.35 190 osd.3
5 hdd 3.63599 1.00000 3723G 2588G 1135G 69.50 1.06 151 osd.5
7 hdd 3.63599 1.00000 3723G 2566G 1156G 68.94 1.05 156 osd.7
9 hdd 3.63599 1.00000 3723G 2481G 1242G 66.65 1.02 164 osd.9
10 hdd 3.63599 1.00000 3723G 2622G 1101G 70.43 1.08 156 osd.10
13 hdd 3.63599 1.00000 3723G 2498G 1225G 67.08 1.02 150 osd.13
15 hdd 3.63599 1.00000 3723G 2664G 1058G 71.56 1.09 167 osd.15
17 hdd 3.63599 1.00000 3723G 2510G 1213G 67.42 1.03 163 osd.17
19 hdd 3.63599 1.00000 3723G 2562G 1161G 68.82 1.05 162 osd.19
21 hdd 3.63599 1.00000 3723G 2683G 1040G 72.05 1.10 169 osd.21
23 hdd 3.63599 1.00000 3723G 2989G 734G 80.28 1.23 169 osd.23
-10 43.63699 - 44684G 32556G 12128G 72.86 1.11 - rack rack03
-9 43.63699 - 44684G 32556G 12128G 72.86 1.11 - host cephnode03
24 hdd 3.63599 1.00000 3723G 2757G 966G 74.05 1.13 155 osd.24
25 hdd 3.63599 1.00000 3723G 3003G 720G 80.66 1.23 186 osd.25
26 hdd 3.63599 1.00000 3723G 2494G 1229G 66.98 1.02 168 osd.26
28 hdd 3.63599 1.00000 3723G 3021G 701G 81.15 1.24 180 osd.28
30 hdd 3.63599 1.00000 3723G 2554G 1169G 68.60 1.05 164 osd.30
32 hdd 3.63599 1.00000 3723G 2060G 1662G 55.34 0.84 147 osd.32
34 hdd 3.63599 1.00000 3723G 3131G 592G 84.08 1.28 181 osd.34
36 hdd 3.63599 1.00000 3723G 2512G 1211G 67.47 1.03 162 osd.36
38 hdd 3.63599 1.00000 3723G 2408G 1315G 64.68 0.99 157 osd.38
40 hdd 3.63599 1.00000 3723G 2997G 726G 80.49 1.23 194 osd.40
42 hdd 3.63599 1.00000 3723G 2645G 1078G 71.05 1.08 161 osd.42
44 hdd 3.63599 1.00000 3723G 2969G 754G 79.74 1.22 173 osd.44
-12 43.63699 - 44684G 32504G 12179G 72.74 1.11 - rack rack04
-11 43.63699 - 44684G 32504G 12179G 72.74 1.11 - host cephnode04
27 hdd 3.63599 1.00000 3723G 2947G 775G 79.16 1.21 186 osd.27
29 hdd 3.63599 1.00000 3723G 3095G 628G 83.13 1.27 175 osd.29
31 hdd 3.63599 1.00000 3723G 2514G 1209G 67.52 1.03 163 osd.31
33 hdd 3.63599 1.00000 3723G 2557G 1166G 68.68 1.05 160 osd.33
35 hdd 3.63599 1.00000 3723G 3215G 508G 86.35 1.32 183 osd.35
37 hdd 3.63599 1.00000 3723G 2455G 1268G 65.93 1.01 151 osd.37
39 hdd 3.63599 1.00000 3723G 2335G 1387G 62.73 0.96 155 osd.39
41 hdd 3.63599 1.00000 3723G 2774G 949G 74.51 1.14 165 osd.41
43 hdd 3.63599 1.00000 3723G 2764G 959G 74.24 1.13 169 osd.43
45 hdd 3.63599 1.00000 3723G 2553G 1169G 68.59 1.05 163 osd.45
46 hdd 3.63599 1.00000 3723G 2645G 1077G 71.06 1.08 167 osd.46
47 hdd 3.63599 1.00000 3723G 2644G 1079G 71.02 1.08 156 osd.47
-14 39.99585 - 33513G 27770G 5742G 82.86 1.26 - rack rack05
-13 39.99585 - 33513G 27770G 5742G 82.86 1.26 - host cephnode05
48 hdd 3.63599 0.90002 3723G 3310G 413G 88.89 1.36 211 osd.48
49 hdd 3.63599 0.80005 3723G 3029G 694G 81.36 1.24 182 osd.49
50 hdd 3.63599 0.85004 3723G 2918G 804G 78.38 1.20 167 osd.50
51 hdd 3.63599 0.85004 3723G 3103G 620G 83.33 1.27 186 osd.51
52 hdd 0 0 0 0 0 0 0 0 osd.52
53 hdd 3.63599 0 0 0 0 0 0 0 osd.53
54 hdd 3.63599 0 0 0 0 0 0 0 osd.54
55 hdd 3.63599 0.85004 3723G 3003G 720G 80.65 1.23 178 osd.55
56 hdd 3.63599 0.84999 3723G 3347G 376G 89.89 1.37 189 osd.56
57 hdd 3.63599 0.75006 3723G 2707G 1016G 72.71 1.11 161 osd.57
58 hdd 3.63599 0.80005 3723G 3228G 495G 86.71 1.32 186 osd.58
59 hdd 3.63599 0.80005 3723G 3122G 601G 83.85 1.28 194 osd.59
-16 43.63699 - 44684G 33402G 11281G 74.75 1.14 - rack rack06
-15 43.63699 - 44684G 33402G 11281G 74.75 1.14 - host cephnode06
60 hdd 3.63599 1.00000 3723G 2317G 1406G 62.22 0.95 149 osd.60
61 hdd 3.63599 1.00000 3723G 3039G 684G 81.62 1.25 183 osd.61
62 hdd 3.63599 1.00000 3723G 2945G 778G 79.09 1.21 189 osd.62
63 hdd 3.63599 1.00000 3723G 2923G 800G 78.50 1.20 166 osd.63
64 hdd 3.63599 1.00000 3723G 3057G 665G 82.11 1.25 180 osd.64
65 hdd 3.63599 1.00000 3723G 2989G 733G 80.30 1.23 170 osd.65
66 hdd 3.63599 1.00000 3723G 2764G 959G 74.25 1.13 166 osd.66
67 hdd 3.63599 1.00000 3723G 2811G 912G 75.50 1.15 175 osd.67
68 hdd 3.63599 1.00000 3723G 1785G 1938G 47.95 0.73 139 osd.68
69 hdd 3.63599 1.00000 3723G 2744G 979G 73.69 1.12 159 osd.69
70 hdd 3.63599 1.00000 3723G 3068G 655G 82.40 1.26 178 osd.70
71 hdd 3.63599 1.00000 3723G 2956G 767G 79.40 1.21 174 osd.71
-18 43.63699 - 44684G 33524G 11159G 75.03 1.15 - rack rack07
-17 43.63699 - 44684G 33524G 11159G 75.03 1.15 - host cephnode07
72 hdd 3.63599 1.00000 3723G 2901G 822G 77.91 1.19 178 osd.72
73 hdd 3.63599 1.00000 3723G 2612G 1110G 70.16 1.07 168 osd.73
74 hdd 3.63599 1.00000 3723G 2870G 853G 77.09 1.18 172 osd.74
75 hdd 3.63599 1.00000 3723G 2813G 910G 75.56 1.15 169 osd.75
76 hdd 3.63599 1.00000 3723G 2861G 862G 76.85 1.17 170 osd.76
77 hdd 3.63599 1.00000 3723G 2807G 916G 75.39 1.15 168 osd.77
78 hdd 3.63599 1.00000 3723G 2678G 1045G 71.92 1.10 156 osd.78
79 hdd 3.63599 1.00000 3723G 2556G 1166G 68.67 1.05 160 osd.79
80 hdd 3.63599 1.00000 3723G 3082G 640G 82.79 1.26 190 osd.80
81 hdd 3.63599 1.00000 3723G 2418G 1305G 64.94 0.99 144 osd.81
82 hdd 3.63599 1.00000 3723G 2881G 841G 77.39 1.18 161 osd.82
83 hdd 3.63599 1.00000 3723G 3039G 683G 81.64 1.25 175 osd.83
-20 90.91017 - 130T 61630G 72421G 45.98 0.70 - rack rack08
-19 43.63699 - 44684G 30861G 13823G 69.06 1.05 - host cephnode08
84 hdd 3.63599 1.00000 3723G 2532G 1190G 68.02 1.04 157 osd.84
85 hdd 3.63599 1.00000 3723G 2518G 1205G 67.64 1.03 166 osd.85
86 hdd 3.63599 1.00000 3723G 2504G 1219G 67.25 1.03 151 osd.86
87 hdd 3.63599 1.00000 3723G 2698G 1024G 72.47 1.11 161 osd.87
88 hdd 3.63599 1.00000 3723G 2527G 1196G 67.87 1.04 147 osd.88
89 hdd 3.63599 1.00000 3723G 2508G 1215G 67.36 1.03 142 osd.89
90 hdd 3.63599 1.00000 3723G 2317G 1406G 62.24 0.95 142 osd.90
91 hdd 3.63599 1.00000 3723G 2582G 1140G 69.36 1.06 147 osd.91
92 hdd 3.63599 1.00000 3723G 2656G 1066G 71.35 1.09 144 osd.92
93 hdd 3.63599 1.00000 3723G 2448G 1275G 65.74 1.00 154 osd.93
94 hdd 3.63599 1.00000 3723G 2783G 939G 74.76 1.14 163 osd.94
95 hdd 3.63599 1.00000 3723G 2782G 941G 74.73 1.14 152 osd.95
-21 43.63678 - 44684G 30331G 14353G 67.88 1.04 - host cephnode09
96 hdd 3.63640 1.00000 3723G 3003G 719G 80.67 1.23 161 osd.96
97 hdd 3.63640 1.00000 3723G 2581G 1142G 69.32 1.06 151 osd.97
98 hdd 3.63640 1.00000 3723G 2118G 1605G 56.88 0.87 140 osd.98
99 hdd 3.63640 1.00000 3723G 2926G 796G 78.60 1.20 165 osd.99
100 hdd 3.63640 1.00000 3723G 2492G 1231G 66.92 1.02 149 osd.100
101 hdd 3.63640 1.00000 3723G 2605G 1117G 69.98 1.07 165 osd.101
102 hdd 3.63640 1.00000 3723G 2159G 1563G 58.01 0.89 141 osd.102
103 hdd 3.63640 1.00000 3723G 2328G 1395G 62.53 0.95 146 osd.103
104 hdd 3.63640 1.00000 3723G 2624G 1099G 70.48 1.08 163 osd.104
105 hdd 3.63640 1.00000 3723G 2582G 1141G 69.34 1.06 142 osd.105
106 hdd 3.63640 1.00000 3723G 2401G 1322G 64.48 0.98 161 osd.106
107 hdd 3.63640 1.00000 3723G 2507G 1216G 67.33 1.03 159 osd.107
-43 3.63640 - 44684G 438G 44245G 0.98 0.01 - host cephnode10 ## Added after cluster pools got full
108 hdd 3.63640 1.00000 3723G 51915M 3672G 1.36 0.02 36 osd.108
109 hdd 0 1.00000 3723G 72735M 3652G 1.91 0.03 4 osd.109
110 hdd 0 1.00000 3723G 36948M 3687G 0.97 0.01 2 osd.110
111 hdd 0 1.00000 3723G 37043M 3687G 0.97 0.01 2 osd.111
112 hdd 0 1.00000 3723G 72382M 3652G 1.90 0.03 4 osd.112
113 hdd 0 1.00000 3723G 54850M 3670G 1.44 0.02 3 osd.113
114 hdd 0 1.00000 3723G 36664M 3687G 0.96 0.01 2 osd.114
115 hdd 0 1.00000 3723G 36087M 3688G 0.95 0.01 2 osd.115
116 hdd 0 1.00000 3723G 12066M 3711G 0.32 0.00 0 osd.116
117 hdd 0 1.00000 3723G 36793M 3687G 0.96 0.01 2 osd.117
118 hdd 0 1.00000 3723G 775M 3722G 0.02 0 0 osd.118
119 hdd 0 1.00000 3723G 760M 3722G 0.02 0 0 osd.119
TOTAL 425T 278T 146T 65.51
MIN/MAX VAR: 0/1.37 STDDEV: 23.07
#
I'm wondering why one FULL OSD made all cluster pools full? Does OSD_FULL state stop write operations to all OSDs on the node that full OSD resides or just to concerned OSD ?
Should pg_num/pgp_num be increased to get better data balancing across all OSDs?
Why there is only 7415G MAX AVAIL for volumes pool and 14831G for default.rgw.buckets.data pool while cluster %RAW USED is 65.50 (only?) ? Is it somehow related to bad looking node cephnode05 (OSDs highly utilized) and the fact that K+M of EC pool was equal to the number of nodes in the cluster?
Best Regards
Jakub
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com