Ciao Dan, thanks for your messages!
On 4/1/22 11:25, Dan van der Ster wrote:
The PGs are stale, down, inactive *because* the OSDs don't start.
Your main efforts should be to bring OSDs up, without purging or
zapping or anyting like that.
(Currently your cluster is down, but there are hopes to recover. If
you start purging things that can result in permanent data loss.).
Sure, will not do anything like purge/whatever, as long as I can abuse
your patience...
Looking for the string 'start interval does not contain the required
bound' I found similar errors in the three OSDs:
osd.158: 85.12s0
osd.145: 85.33s0
osd.121: 85.11s0
Is that log also for PG 85.12 on the other OSDs?
Not sure I am getting your point here, sorry. I grep'ed that string in
the above logs, and only found the occurrences I mentioned. To be
specific, reference to 85.12 was found only on osd.158 and not on the
other 'down' OSDs.
Here is the output of "pg 85.12 query":
https://pastebin.ubuntu.com/p/ww3JdwDXVd/
and its status (also showing the other 85.XX, for reference):
This is very weird:
"up": [
2147483647,
2147483647,
2147483647,
2147483647,
2147483647
],
"acting": [
67,
91,
82,
2147483647,
112
],
Right now, do the following:
ceph osd set norebalance
That will prevent PGs moving from one OSD to another *unless* they are degraded.
Done
2. My theory about what happened here. Your crush rule change "osd ->
host" below basically asked all PGs to be moved.
Some glitch happened and some broken parts of PG 85.12 ended up on
some OSDs, now causing those OSDs to crash.
85.12 is "fine", I mean active, now because there are enough complete
parts of it on other osds.
The fact that "up" above is listing '2147483647' for every osd means
your new crush rule is currently broken. Let's deal with fixing that
later.
Hmm, in theory, it looks correct, but I see your point and in fact I am
stuck with some 1-3% fraction of the objects misplaced/degraded, all of
them in pool 85
~]$ ceph --cluster cephpa1 health detail | grep -e mispl -e degra
HEALTH_WARN norebalance flag(s) set; Reduced data availability: 1 pg
inactive, 1 pg down, 18 pgs stale; Degraded data redundancy:
2206683/190788911 objects degraded (1.157%), 41 pgs degraded, 41 pgs
undersized; 30 pgs not deep-scrubbed in time; 313 pgs not scrubbed in
time; 1 pools have too many placement groups; 188 daemons have recently
crashed
PG_DEGRADED Degraded data redundancy: 2206683/190788911 objects degraded
(1.157%), 41 pgs degraded, 41 pgs undersized
pg 85.2 is stuck undersized for 185395.897287, current state
active+undersized+degraded+remapped, last acting [159,2147483647,124,120,90]
pg 85.3 is stuck undersized for 291884.265248, current state
active+undersized+degraded+remapped, last acting [177,2147483647,57,113,102]
pg 85.4 is stuck undersized for 291693.344022, current state
active+undersized+degraded+remapped, last acting [67,77,73,2147483647,104]
pg 85.5 is stuck undersized for 185397.250499, current state
active+undersized+degraded+remapped, last acting
[2147483647,91,2147483647,82,73]
pg 85.6 is stuck undersized for 291884.257218, current state
active+undersized+degraded+remapped, last acting [120,2147483647,177,38,67]
pg 85.8 is stuck undersized for 291694.629801, current state
active+undersized+degraded+remapped, last acting
[38,72,2147483647,2147483647,77]
pg 85.a is stuck undersized for 291897.611536, current state
active+undersized+degraded+remapped, last acting [104,90,120,38,2147483647]
pg 85.b is stuck undersized for 185395.911007, current state
active+undersized+degraded+remapped, last acting
[170,104,2147483647,143,124]
pg 85.c is stuck undersized for 185536.672676, current state
active+undersized+degraded+remapped, last acting
[2147483647,2147483647,57,91,68]
pg 85.d is stuck undersized for 291663.760018, current state
active+undersized+degraded+remapped, last acting [2147483647,72,67,124,143]
pg 85.e is stuck undersized for 291693.403160, current state
active+undersized+degraded+remapped, last acting [82,77,90,2147483647,113]
pg 85.f is stuck undersized for 291860.326142, current state
active+undersized+degraded+remapped, last acting [177,2147483647,143,95,72]
pg 85.12 is stuck undersized for 291685.709020, current state
active+undersized+degraded+remapped, last acting [67,91,82,2147483647,112]
pg 85.14 is stuck undersized for 185397.199612, current state
active+undersized+degraded+remapped, last acting
[2147483647,72,119,2147483647,38]
pg 85.15 is stuck undersized for 185535.535864, current state
active+undersized+degraded+remapped, last acting [77,96,119,57,2147483647]
pg 85.18 is stuck undersized for 291860.291557, current state
active+undersized+degraded+remapped, last acting [119,2147483647,67,95,91]
pg 85.1b is stuck undersized for 185611.873774, current state
active+undersized+degraded+remapped, last acting [90,112,2147483647,77,72]
pg 85.1c is stuck undersized for 291947.099227, current state
active+undersized+degraded+remapped, last acting
[2147483647,177,73,2147483647,38]
pg 85.1d is stuck undersized for 292058.016158, current state
active+undersized+degraded+remapped, last acting
[177,67,2147483647,2147483647,68]
pg 85.1e is stuck undersized for 291899.223212, current state
active+undersized+degraded+remapped, last acting
[177,2147483647,120,2147483647,38]
pg 85.1f is stuck undersized for 185535.548719, current state
active+undersized+degraded+remapped, last acting
[170,113,104,119,2147483647]
pg 85.20 is stuck undersized for 291694.562719, current state
active+undersized+degraded+remapped, last acting
[67,38,2147483647,2147483647,90]
pg 85.21 is stuck undersized for 291688.277782, current state
active+undersized+degraded+remapped, last acting [143,2147483647,90,112,72]
pg 85.22 is stuck undersized for 291947.095445, current state
active+undersized+degraded+remapped, last acting
[82,177,73,2147483647,2147483647]
pg 85.24 is stuck undersized for 291688.257642, current state
active+undersized+degraded+remapped, last acting [177,112,57,2147483647,67]
pg 85.27 is stuck undersized for 291884.264556, current state
active+undersized+degraded+remapped, last acting [177,67,2147483647,119,102]
pg 85.28 is stuck undersized for 368975.505176, current state
active+undersized+degraded+remapped, last acting [112,72,2147483647,57,113]
pg 85.2a is stuck undersized for 185519.812553, current state
active+undersized+degraded+remapped, last acting [95,96,2147483647,72,159]
pg 85.2b is stuck undersized for 185397.201806, current state
active+undersized+degraded+remapped, last acting
[67,112,91,2147483647,2147483647]
pg 85.2e is stuck undersized for 291945.847116, current state
active+undersized+degraded+remapped, last acting [2147483647,72,90,113,57]
pg 85.2f is stuck undersized for 292074.339457, current state
active+undersized+degraded+remapped, last acting [102,119,68,2147483647,120]
pg 85.30 is stuck undersized for 292121.916283, current state
active+undersized+degraded+remapped, last acting [2147483647,177,113,72,120]
pg 85.31 is stuck undersized for 185536.671705, current state
active+undersized+degraded+remapped, last acting
[2147483647,57,67,2147483647,82]
pg 85.32 is stuck undersized for 291881.774711, current state
active+undersized+degraded+remapped, last acting
[143,2147483647,82,2147483647,77]
pg 85.34 is stuck undersized for 291729.633382, current state
active+undersized+degraded+remapped, last acting
[77,2147483647,95,104,2147483647]
pg 85.35 is stuck undersized for 291705.165203, current state
active+undersized+degraded+remapped, last acting
[77,82,95,2147483647,2147483647]
pg 85.36 is stuck undersized for 185535.552076, current state
active+undersized+degraded+remapped, last acting [2147483647,170,95,104,124]
pg 85.37 is stuck undersized for 291694.614993, current state
active+undersized+degraded+remapped, last acting [2147483647,120,113,104,68]
pg 85.3b is stuck undersized for 185395.908694, current state
active+undersized+degraded+remapped, last acting [119,2147483647,67,77,96]
pg 85.3d is stuck undersized for 185397.256717, current state
active+undersized+degraded+remapped, last acting
[95,170,2147483647,2147483647,159]
pg 85.3f is stuck undersized for 291663.857910, current state
active+undersized+degraded+remapped, last acting [177,124,82,119,2147483647]
3. Question -- what is the output of `ceph osd pool ls detail | grep
csd-dataonly-ec-pool` ? If you have `min_size 3` there, then this is
part of the root cause of the outage here. At the end of this thread,
*only after everything is recovered and no PGs are
undersized/degraded* , you will need to set it `ceph osd pool set
csd-dataonly-ec-pool min_size 4`
Indeed, it's 3. Connected to your last point below (never mess with
crush rules if there is anything ongoing), during rebalancing there was
something which was stuck and I think "health detail" was suggesting
that reducing min-size would help. I took not of the pools for which I
updated the parameter, and will go back to the proper values once the
situation will be clean.
pool 85 'csd-dataonly-ec-pool' erasure size 5 min_size 3 crush_rule 5
object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn
last_change 616460 flags
hashpspool,ec_overwrites,nodelete,selfmanaged_snaps stripe_width 12288
application rbd
4. The immediate goal should be to try to get osd.158 to start up, by
"removing" the corrupted part of PG 85.12 from it.
IF we can get osd.158 started, then the same approach should work for
the other OSDs.
From your previous log, osd.158 has a broken piece of pg 85.12. Let's
export-remove it:
ceph-objectstore-tool --data-path /var/lib/ceph/osd/cephpa1-158/
--op export-remove --pgid 85.12s0 > osd.158-85.12s0.bin
Please do that, then try to start osd.158, and report back here.
Did that, and osd.158 is now UP, thanks! I think the output of "ceph -s"
did not change but that's a consequence of norebalance, I guess.
If I understand correctly, it should now be safe (but I will wait for
your green light) to repeat the same for:
osd.121 chunk 85.11s0
osd.145 chunk 85.33s0
so they can also start. And once started, I can clear the
"norebalance" flag, correct?
Two more questions below...
85.11 39501 0 0 0 165479411712 0
0 3000 stale+active+clean 3d 606021'532631
617659:1827554
[124,157,68,72,102]p124
[124,157,68,72,102]p124 2022-03-28 07:21:00.566032 2022-03-28
07:21:00.566032
85.12 39704 39704 158816 0 166350008320 0
0 3028 active+undersized+degraded+remapped 3d 606021'573200
620336:1839924
[2147483647,2147483647,2147483647,2147483647,2147483647]p-1
[67,91,82,2147483647,112]p67 2022-03-15 03:25:28.478280
2022-03-12 19:10:45.866650
85.25 39402 0 0 0 165108592640 0
0 3098 stale+down+remapped 3d 606021'521273
618930:1734492
[2147483647,2147483647,2147483647,2147483647,2147483647]p-1
[2147483647,2147483647,96,2147483647,2147483647]p96 2022-03-15
04:08:42.561720 2022-03-09 17:05:34.205121
85.33 39319 0 0 0 164740796416 0
0 3000 stale+active+clean 3d 606021'513259
617659:2125167
[174,112,85,102,124]p174
[174,112,85,102,124]p174 2022-03-28 07:21:12.097873 2022-03-28
07:21:12.097873
So 85.11 and 85.33 do not look bad, after all: why are the relevant OSDs
complaining? Is there a way to force them (OSDs) to forget about the
chunks they possess, as apparently those have already safely migrated
elsewhere?
Indeed 85.12 is not really healthy...
As for chunks of 85.12 and 85.25, the 3 down OSDs have:
osd.121
85.12s3
85.25s3
osd.158
85.12s0
osd.145
none
I guess I can safely purge osd.145 and re-create it, then.
No!!! It contains crucial data for *other* PGs!
Ok! :-)
As for the history of the pool, this is an EC pool with metadata in a
SSD-backed replicated pool. At some point I realized I had made a
mistake in the allocation rule for the "data" part, so I changed the
relevant rule to:
~]$ ceph --cluster cephpa1 osd lspools | grep 85
85 csd-dataonly-ec-pool
~]$ ceph --cluster cephpa1 osd pool get csd-dataonly-ec-pool crush_rule
crush_rule: csd-data-pool
rule csd-data-pool {
id 5
type erasure
min_size 3
max_size 5
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class big
step choose indep 0 type host <--- this was "osd", before
step emit
}
Can you please share the output of `ceph osd tree` ?
We need to understand why crush is not working any more for your pool.
Sure! Here it is. For historical reasons there are buckets of type
"storage" which however you can safely ignore as they are no longer
present in any crush_rule.
Please also don't worry about the funny weights, as I am preparing for
hardware replacemente and am freeing up space.
~]# ceph --cluster cephpa1 osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT
PRI-AFF
-1 125.79997 root default
-8 42.70000 rack rack1
-53 1.64999 host r1srv05
-54 0.39999 storage r1srv05sto1
57 big 0.09999 osd.57 up 1.00000
1.00000
67 big 0.09999 osd.67 up 1.00000
1.00000
72 big 0.09999 osd.72 up 1.00000
1.00000
90 big 0.09999 osd.90 up 1.00000
1.00000
-55 1.25000 storage r1srv05sto2
47 hdd 0.25000 osd.47 up 1.00000
1.00000
52 hdd 0.25000 osd.52 up 1.00000
1.00000
61 hdd 0.25000 osd.61 up 1.00000
1.00000
80 hdd 0.25000 osd.80 up 1.00000
1.00000
164 hdd 0.25000 osd.164 up 1.00000
1.00000
-2 1.04999 host r1srv07
-4 1.00000 storage r1srv07sto1
5 hdd 0.25000 osd.5 up 1.00000
1.00000
14 hdd 0.25000 osd.14 up 1.00000
1.00000
18 hdd 0.25000 osd.18 up 1.00000
1.00000
49 hdd 0.25000 osd.49 up 1.00000
1.00000
-5 0.04999 storage r1srv07sto2
99 big 0.00999 osd.99 up 1.00000
1.00000
128 big 0.00999 osd.128 up 1.00000
1.00000
131 big 0.00999 osd.131 up 1.00000
1.00000
134 big 0.00999 osd.134 up 1.00000
1.00000
135 big 0.00999 osd.135 up 1.00000
1.00000
-89 20.00000 host r1srv100
-100 20.00000 storage r1srv100sto1
119 big 1.00000 osd.119 up 1.00000
1.00000
157 big 1.00000 osd.157 up 1.00000
1.00000
175 big 1.00000 osd.175 up 1.00000
1.00000
177 big 1.00000 osd.177 up 1.00000
1.00000
54 hdd 1.00000 osd.54 up 1.00000
1.00000
69 hdd 1.00000 osd.69 up 1.00000
1.00000
160 hdd 1.00000 osd.160 up 1.00000
1.00000
165 hdd 1.00000 osd.165 up 1.00000
1.00000
167 hdd 1.00000 osd.167 up 1.00000
1.00000
169 hdd 1.00000 osd.169 up 1.00000
1.00000
171 hdd 1.00000 osd.171 up 1.00000
1.00000
172 hdd 1.00000 osd.172 up 1.00000
1.00000
173 hdd 1.00000 osd.173 up 1.00000
1.00000
2 ssd 1.00000 osd.2 up 1.00000
1.00000
9 ssd 1.00000 osd.9 up 1.00000
1.00000
17 ssd 1.00000 osd.17 up 1.00000
1.00000
28 ssd 1.00000 osd.28 up 1.00000
1.00000
39 ssd 1.00000 osd.39 up 1.00000
1.00000
178 test 1.00000 osd.178 up 1.00000
1.00000
179 test 1.00000 osd.179 up 1.00000
1.00000
-169 20.00000 host r1sto08
-189 20.00000 storage r1sto08sto1
34 big 1.00000 osd.34 up 1.00000
1.00000
56 big 1.00000 osd.56 up 1.00000
1.00000
64 big 1.00000 osd.64 up 1.00000
1.00000
68 big 1.00000 osd.68 up 1.00000
1.00000
13 hdd 1.00000 osd.13 up 1.00000
1.00000
29 hdd 1.00000 osd.29 up 1.00000
1.00000
31 hdd 1.00000 osd.31 up 1.00000
1.00000
41 hdd 1.00000 osd.41 up 1.00000
1.00000
43 hdd 1.00000 osd.43 up 1.00000
1.00000
46 hdd 1.00000 osd.46 up 1.00000
1.00000
60 hdd 1.00000 osd.60 up 1.00000
1.00000
62 hdd 1.00000 osd.62 up 1.00000
1.00000
63 hdd 1.00000 osd.63 up 1.00000
1.00000
4 ssd 1.00000 osd.4 up 1.00000
1.00000
16 ssd 1.00000 osd.16 up 1.00000
1.00000
21 ssd 1.00000 osd.21 up 1.00000
1.00000
22 ssd 1.00000 osd.22 up 1.00000
1.00000
24 ssd 1.00000 osd.24 up 1.00000
1.00000
36 test 1.00000 osd.36 up 1.00000
1.00000
53 test 1.00000 osd.53 up 1.00000
1.00000
-9 41.04999 rack rack2
-3 0.54999 host r2srv02
-6 0.50000 storage r2srv02sto1
38 big 0.09999 osd.38 up 1.00000
1.00000
73 big 0.09999 osd.73 up 1.00000
1.00000
82 big 0.09999 osd.82 up 1.00000
1.00000
113 big 0.09999 osd.113 up 1.00000
1.00000
120 big 0.09999 osd.120 up 1.00000
1.00000
-7 0.04999 storage r2srv02sto2
132 big 0.00999 osd.132 up 1.00000
1.00000
137 big 0.00999 osd.137 up 1.00000
1.00000
139 big 0.00999 osd.139 up 1.00000
1.00000
141 big 0.00999 osd.141 up 1.00000
1.00000
142 big 0.00999 osd.142 up 1.00000
1.00000
-91 18.00000 host r2srv100
-97 18.00000 storage r2srv100sto1
121 big 0 osd.121 down 0
1.00000
158 big 0 osd.158 up 1.00000
1.00000
159 big 1.00000 osd.159 up 1.00000
1.00000
176 big 1.00000 osd.176 up 1.00000
1.00000
146 hdd 1.00000 osd.146 up 1.00000
1.00000
148 hdd 1.00000 osd.148 up 1.00000
1.00000
149 hdd 1.00000 osd.149 up 1.00000
1.00000
150 hdd 1.00000 osd.150 up 1.00000
1.00000
151 hdd 1.00000 osd.151 up 1.00000
1.00000
152 hdd 1.00000 osd.152 up 1.00000
1.00000
153 hdd 1.00000 osd.153 up 1.00000
1.00000
154 hdd 1.00000 osd.154 up 1.00000
1.00000
155 hdd 1.00000 osd.155 up 1.00000
1.00000
25 ssd 1.00000 osd.25 up 1.00000
1.00000
35 ssd 1.00000 osd.35 up 1.00000
1.00000
51 ssd 1.00000 osd.51 up 1.00000
1.00000
65 ssd 1.00000 osd.65 up 1.00000
1.00000
144 ssd 1.00000 osd.144 up 1.00000
1.00000
161 test 1.00000 osd.161 up 1.00000
1.00000
162 test 1.00000 osd.162 up 1.00000
1.00000
-65 2.50000 host r2srv15
-67 1.25000 storage r2srv15sto1
1 hdd 0.25000 osd.1 up 1.00000
1.00000
6 hdd 0.25000 osd.6 up 1.00000
1.00000
10 hdd 0.25000 osd.10 up 1.00000
1.00000
15 hdd 0.25000 osd.15 up 1.00000
1.00000
19 hdd 0.25000 osd.19 up 1.00000
1.00000
-68 1.25000 storage r2srv15sto2
27 hdd 0.25000 osd.27 up 1.00000
1.00000
33 hdd 0.25000 osd.33 up 1.00000
1.00000
44 hdd 0.25000 osd.44 up 1.00000
1.00000
59 hdd 0.25000 osd.59 up 1.00000
1.00000
79 hdd 0.25000 osd.79 up 1.00000
1.00000
-170 20.00000 host r2sto08
-188 20.00000 storage r2sto08sto1
85 big 1.00000 osd.85 up 1.00000
1.00000
96 big 1.00000 osd.96 up 1.00000
1.00000
102 big 1.00000 osd.102 up 1.00000
1.00000
104 big 1.00000 osd.104 up 1.00000
1.00000
8 hdd 1.00000 osd.8 up 1.00000
1.00000
81 hdd 1.00000 osd.81 up 1.00000
1.00000
83 hdd 1.00000 osd.83 up 1.00000
1.00000
87 hdd 1.00000 osd.87 up 1.00000
1.00000
88 hdd 1.00000 osd.88 up 1.00000
1.00000
92 hdd 1.00000 osd.92 up 1.00000
1.00000
97 hdd 1.00000 osd.97 up 1.00000
1.00000
98 hdd 1.00000 osd.98 up 1.00000
1.00000
100 hdd 1.00000 osd.100 up 1.00000
1.00000
3 ssd 1.00000 osd.3 up 1.00000
1.00000
70 ssd 1.00000 osd.70 up 1.00000
1.00000
74 ssd 1.00000 osd.74 up 1.00000
1.00000
76 ssd 1.00000 osd.76 up 1.00000
1.00000
78 ssd 1.00000 osd.78 up 1.00000
1.00000
86 test 1.00000 osd.86 up 1.00000
1.00000
94 test 1.00000 osd.94 up 1.00000
1.00000
-10 42.04999 rack rack3
-66 2.50000 host r3srv10
-69 1.25000 storage r3srv10sto1
48 hdd 0.25000 osd.48 up 1.00000
1.00000
50 hdd 0.25000 osd.50 up 1.00000
1.00000
58 hdd 0.25000 osd.58 up 1.00000
1.00000
84 hdd 0.25000 osd.84 up 1.00000
1.00000
101 hdd 0.25000 osd.101 up 1.00000
1.00000
-70 1.25000 storage r3srv10sto2
11 hdd 0.25000 osd.11 up 1.00000
1.00000
20 hdd 0.25000 osd.20 up 1.00000
1.00000
26 hdd 0.25000 osd.26 up 1.00000
1.00000
37 hdd 0.25000 osd.37 up 1.00000
1.00000
42 hdd 0.25000 osd.42 up 1.00000
1.00000
-94 20.00000 host r3srv100
-96 20.00000 storage r3srv100sto1
112 big 1.00000 osd.112 up 1.00000
1.00000
124 big 1.00000 osd.124 up 1.00000
1.00000
156 big 1.00000 osd.156 up 1.00000
1.00000
174 big 1.00000 osd.174 up 1.00000
1.00000
55 hdd 1.00000 osd.55 up 1.00000
1.00000
71 hdd 1.00000 osd.71 up 1.00000
1.00000
89 hdd 1.00000 osd.89 up 1.00000
1.00000
103 hdd 1.00000 osd.103 up 1.00000
1.00000
105 hdd 1.00000 osd.105 up 1.00000
1.00000
107 hdd 1.00000 osd.107 up 1.00000
1.00000
109 hdd 1.00000 osd.109 up 1.00000
1.00000
110 hdd 1.00000 osd.110 up 1.00000
1.00000
111 hdd 1.00000 osd.111 up 1.00000
1.00000
0 ssd 1.00000 osd.0 up 1.00000
1.00000
23 ssd 1.00000 osd.23 up 1.00000
1.00000
30 ssd 1.00000 osd.30 up 1.00000
1.00000
40 ssd 1.00000 osd.40 up 1.00000
1.00000
45 ssd 1.00000 osd.45 up 1.00000
1.00000
126 test 1.00000 osd.126 up 1.00000
1.00000
127 test 1.00000 osd.127 up 1.00000
1.00000
-11 0.54999 host r3srv15
-12 0.50000 storage r3srv15sto1
77 big 0.09999 osd.77 up 1.00000
1.00000
91 big 0.09999 osd.91 up 1.00000
1.00000
95 big 0.09999 osd.95 up 1.00000
1.00000
140 big 0.09999 osd.140 up 1.00000
1.00000
143 big 0.09999 osd.143 up 1.00000
1.00000
-13 0.04999 storage r3srv15sto2
66 big 0.00999 osd.66 up 1.00000
1.00000
75 big 0.00999 osd.75 up 1.00000
1.00000
114 big 0.00999 osd.114 up 1.00000
1.00000
118 big 0.00999 osd.118 up 1.00000
1.00000
125 big 0.00999 osd.125 up 1.00000
1.00000
-171 19.00000 host r3sto08
-187 19.00000 storage r3sto08sto1
123 big 1.00000 osd.123 up 1.00000
1.00000
145 big 0 osd.145 down 0
1.00000
168 big 1.00000 osd.168 up 1.00000
1.00000
170 big 1.00000 osd.170 up 1.00000
1.00000
12 hdd 1.00000 osd.12 up 1.00000
1.00000
117 hdd 1.00000 osd.117 up 1.00000
1.00000
122 hdd 1.00000 osd.122 up 1.00000
1.00000
130 hdd 1.00000 osd.130 up 1.00000
1.00000
133 hdd 1.00000 osd.133 up 1.00000
1.00000
136 hdd 1.00000 osd.136 up 1.00000
1.00000
147 hdd 1.00000 osd.147 up 1.00000
1.00000
163 hdd 1.00000 osd.163 up 1.00000
1.00000
166 hdd 1.00000 osd.166 up 1.00000
1.00000
7 ssd 1.00000 osd.7 up 1.00000
1.00000
106 ssd 1.00000 osd.106 up 1.00000
1.00000
108 ssd 1.00000 osd.108 up 1.00000
1.00000
115 ssd 1.00000 osd.115 up 1.00000
1.00000
116 ssd 1.00000 osd.116 up 1.00000
1.00000
129 test 1.00000 osd.129 up 1.00000
1.00000
138 test 1.00000 osd.138 up 1.00000
1.00000
At the time I changed the rule, there was no 'down' PG, all PGs in the
cluster were 'active' plus possibly some other state (remapped,
degraded, whatever) as I had added some new disk servers few days before.
Never make crush rule changes when any PG is degraded, remapped, or whatever!
They must all be active+clean to consider big changes like injecting a
new crush rule!!
Ok, now I think I learned it. In my mind it was a sort of optimization:
as I was moving stuff around due to the additional servers, why not at
the same time update the crush rule?
Will remember the lesson for the future.
Thanks!
Fulvio
--
Fulvio Galeazzi
GARR-CSD Department
tel.: +39-334-6533-250
skype: fgaleazzi70
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx