Re: PG down, due to 3 OSD failing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ciao Dan, thanks for your messages!

On 4/1/22 11:25, Dan van der Ster wrote:
The PGs are stale, down, inactive *because* the OSDs don't start.
Your main efforts should be to bring OSDs up, without purging or
zapping or anyting like that.
(Currently your cluster is down, but there are hopes to recover. If
you start purging things that can result in permanent data loss.).

Sure, will not do anything like purge/whatever, as long as I can abuse your patience...


Looking for the string 'start interval does not contain the required
bound' I found similar errors in the three OSDs:
osd.158: 85.12s0
osd.145: 85.33s0
osd.121: 85.11s0

Is that log also for PG 85.12 on the other OSDs?

Not sure I am getting your point here, sorry. I grep'ed that string in the above logs, and only found the occurrences I mentioned. To be specific, reference to 85.12 was found only on osd.158 and not on the other 'down' OSDs.

Here is the output of "pg 85.12 query":
         https://pastebin.ubuntu.com/p/ww3JdwDXVd/
   and its status (also showing the other 85.XX, for reference):

This is very weird:

     "up": [
         2147483647,
         2147483647,
         2147483647,
         2147483647,
         2147483647
     ],
     "acting": [
         67,
         91,
         82,
         2147483647,
         112
     ],

Right now, do the following:
   ceph osd set norebalance
That will prevent PGs moving from one OSD to another *unless* they are degraded.

Done

2. My theory about what happened here. Your crush rule change "osd ->
host" below basically asked all PGs to be moved.
Some glitch happened and some broken parts of PG 85.12 ended up on
some OSDs, now causing those OSDs to crash.
85.12 is "fine", I mean active, now because there are enough complete
parts of it on other osds.
The fact that "up" above is listing '2147483647' for every osd means
your new crush rule is currently broken. Let's deal with fixing that
later.

Hmm, in theory, it looks correct, but I see your point and in fact I am stuck with some 1-3% fraction of the objects misplaced/degraded, all of them in pool 85

~]$ ceph --cluster cephpa1 health detail | grep -e mispl -e degra
HEALTH_WARN norebalance flag(s) set; Reduced data availability: 1 pg inactive, 1 pg down, 18 pgs stale; Degraded data redundancy: 2206683/190788911 objects degraded (1.157%), 41 pgs degraded, 41 pgs undersized; 30 pgs not deep-scrubbed in time; 313 pgs not scrubbed in time; 1 pools have too many placement groups; 188 daemons have recently crashed PG_DEGRADED Degraded data redundancy: 2206683/190788911 objects degraded (1.157%), 41 pgs degraded, 41 pgs undersized pg 85.2 is stuck undersized for 185395.897287, current state active+undersized+degraded+remapped, last acting [159,2147483647,124,120,90] pg 85.3 is stuck undersized for 291884.265248, current state active+undersized+degraded+remapped, last acting [177,2147483647,57,113,102] pg 85.4 is stuck undersized for 291693.344022, current state active+undersized+degraded+remapped, last acting [67,77,73,2147483647,104] pg 85.5 is stuck undersized for 185397.250499, current state active+undersized+degraded+remapped, last acting [2147483647,91,2147483647,82,73] pg 85.6 is stuck undersized for 291884.257218, current state active+undersized+degraded+remapped, last acting [120,2147483647,177,38,67] pg 85.8 is stuck undersized for 291694.629801, current state active+undersized+degraded+remapped, last acting [38,72,2147483647,2147483647,77] pg 85.a is stuck undersized for 291897.611536, current state active+undersized+degraded+remapped, last acting [104,90,120,38,2147483647] pg 85.b is stuck undersized for 185395.911007, current state active+undersized+degraded+remapped, last acting [170,104,2147483647,143,124] pg 85.c is stuck undersized for 185536.672676, current state active+undersized+degraded+remapped, last acting [2147483647,2147483647,57,91,68] pg 85.d is stuck undersized for 291663.760018, current state active+undersized+degraded+remapped, last acting [2147483647,72,67,124,143] pg 85.e is stuck undersized for 291693.403160, current state active+undersized+degraded+remapped, last acting [82,77,90,2147483647,113] pg 85.f is stuck undersized for 291860.326142, current state active+undersized+degraded+remapped, last acting [177,2147483647,143,95,72] pg 85.12 is stuck undersized for 291685.709020, current state active+undersized+degraded+remapped, last acting [67,91,82,2147483647,112] pg 85.14 is stuck undersized for 185397.199612, current state active+undersized+degraded+remapped, last acting [2147483647,72,119,2147483647,38] pg 85.15 is stuck undersized for 185535.535864, current state active+undersized+degraded+remapped, last acting [77,96,119,57,2147483647] pg 85.18 is stuck undersized for 291860.291557, current state active+undersized+degraded+remapped, last acting [119,2147483647,67,95,91] pg 85.1b is stuck undersized for 185611.873774, current state active+undersized+degraded+remapped, last acting [90,112,2147483647,77,72] pg 85.1c is stuck undersized for 291947.099227, current state active+undersized+degraded+remapped, last acting [2147483647,177,73,2147483647,38] pg 85.1d is stuck undersized for 292058.016158, current state active+undersized+degraded+remapped, last acting [177,67,2147483647,2147483647,68] pg 85.1e is stuck undersized for 291899.223212, current state active+undersized+degraded+remapped, last acting [177,2147483647,120,2147483647,38] pg 85.1f is stuck undersized for 185535.548719, current state active+undersized+degraded+remapped, last acting [170,113,104,119,2147483647] pg 85.20 is stuck undersized for 291694.562719, current state active+undersized+degraded+remapped, last acting [67,38,2147483647,2147483647,90] pg 85.21 is stuck undersized for 291688.277782, current state active+undersized+degraded+remapped, last acting [143,2147483647,90,112,72] pg 85.22 is stuck undersized for 291947.095445, current state active+undersized+degraded+remapped, last acting [82,177,73,2147483647,2147483647] pg 85.24 is stuck undersized for 291688.257642, current state active+undersized+degraded+remapped, last acting [177,112,57,2147483647,67] pg 85.27 is stuck undersized for 291884.264556, current state active+undersized+degraded+remapped, last acting [177,67,2147483647,119,102] pg 85.28 is stuck undersized for 368975.505176, current state active+undersized+degraded+remapped, last acting [112,72,2147483647,57,113] pg 85.2a is stuck undersized for 185519.812553, current state active+undersized+degraded+remapped, last acting [95,96,2147483647,72,159] pg 85.2b is stuck undersized for 185397.201806, current state active+undersized+degraded+remapped, last acting [67,112,91,2147483647,2147483647] pg 85.2e is stuck undersized for 291945.847116, current state active+undersized+degraded+remapped, last acting [2147483647,72,90,113,57] pg 85.2f is stuck undersized for 292074.339457, current state active+undersized+degraded+remapped, last acting [102,119,68,2147483647,120] pg 85.30 is stuck undersized for 292121.916283, current state active+undersized+degraded+remapped, last acting [2147483647,177,113,72,120] pg 85.31 is stuck undersized for 185536.671705, current state active+undersized+degraded+remapped, last acting [2147483647,57,67,2147483647,82] pg 85.32 is stuck undersized for 291881.774711, current state active+undersized+degraded+remapped, last acting [143,2147483647,82,2147483647,77] pg 85.34 is stuck undersized for 291729.633382, current state active+undersized+degraded+remapped, last acting [77,2147483647,95,104,2147483647] pg 85.35 is stuck undersized for 291705.165203, current state active+undersized+degraded+remapped, last acting [77,82,95,2147483647,2147483647] pg 85.36 is stuck undersized for 185535.552076, current state active+undersized+degraded+remapped, last acting [2147483647,170,95,104,124] pg 85.37 is stuck undersized for 291694.614993, current state active+undersized+degraded+remapped, last acting [2147483647,120,113,104,68] pg 85.3b is stuck undersized for 185395.908694, current state active+undersized+degraded+remapped, last acting [119,2147483647,67,77,96] pg 85.3d is stuck undersized for 185397.256717, current state active+undersized+degraded+remapped, last acting [95,170,2147483647,2147483647,159] pg 85.3f is stuck undersized for 291663.857910, current state active+undersized+degraded+remapped, last acting [177,124,82,119,2147483647]

3. Question -- what is the output of `ceph osd pool ls detail | grep
csd-dataonly-ec-pool` ? If you have `min_size 3` there, then this is
part of the root cause of the outage here. At the end of this thread,
*only after everything is recovered and no PGs are
undersized/degraded* , you will need to set it `ceph osd pool set
csd-dataonly-ec-pool min_size 4`

Indeed, it's 3. Connected to your last point below (never mess with crush rules if there is anything ongoing), during rebalancing there was something which was stuck and I think "health detail" was suggesting that reducing min-size would help. I took not of the pools for which I updated the parameter, and will go back to the proper values once the situation will be clean.

pool 85 'csd-dataonly-ec-pool' erasure size 5 min_size 3 crush_rule 5 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn last_change 616460 flags hashpspool,ec_overwrites,nodelete,selfmanaged_snaps stripe_width 12288 application rbd

4. The immediate goal should be to try to get osd.158 to start up, by
"removing" the corrupted part of PG 85.12 from it.
IF we can get osd.158 started, then the same approach should work for
the other OSDs.
 From your previous log, osd.158 has a broken piece of pg 85.12. Let's
export-remove it:

ceph-objectstore-tool  --data-path /var/lib/ceph/osd/cephpa1-158/
--op export-remove --pgid 85.12s0 > osd.158-85.12s0.bin

Please do that, then try to start osd.158, and report back here.

Did that, and osd.158 is now UP, thanks! I think the output of "ceph -s" did not change but that's a consequence of norebalance, I guess. If I understand correctly, it should now be safe (but I will wait for your green light) to repeat the same for:
osd.121 chunk 85.11s0
osd.145 chunk 85.33s0
so they can also start. And once started, I can clear the "norebalance" flag, correct?

Two more questions below...


85.11    39501        0         0       0 165479411712           0
      0 3000                  stale+active+clean    3d    606021'532631
    617659:1827554
[124,157,68,72,102]p124
[124,157,68,72,102]p124 2022-03-28 07:21:00.566032 2022-03-28
07:21:00.566032
85.12    39704    39704    158816       0 166350008320           0
      0 3028 active+undersized+degraded+remapped    3d    606021'573200
    620336:1839924
[2147483647,2147483647,2147483647,2147483647,2147483647]p-1
             [67,91,82,2147483647,112]p67 2022-03-15 03:25:28.478280
2022-03-12 19:10:45.866650
85.25    39402        0         0       0 165108592640           0
      0 3098                 stale+down+remapped    3d    606021'521273
    618930:1734492
[2147483647,2147483647,2147483647,2147483647,2147483647]p-1
[2147483647,2147483647,96,2147483647,2147483647]p96 2022-03-15
04:08:42.561720 2022-03-09 17:05:34.205121
85.33    39319        0         0       0 164740796416           0
      0 3000                  stale+active+clean    3d    606021'513259
    617659:2125167
[174,112,85,102,124]p174
[174,112,85,102,124]p174 2022-03-28 07:21:12.097873 2022-03-28
07:21:12.097873

So 85.11 and 85.33 do not look bad, after all: why are the relevant OSDs
complaining? Is there a way to force them (OSDs) to forget about the
chunks they possess, as apparently those have already safely migrated
elsewhere?

Indeed 85.12 is not really healthy...
As for chunks of 85.12 and 85.25, the 3 down OSDs have:
osd.121
         85.12s3
         85.25s3
osd.158
         85.12s0
osd.145
         none
I guess I can safely purge osd.145 and re-create it, then.

No!!! It contains crucial data for *other* PGs!

Ok! :-)

As for the history of the pool, this is an EC pool with metadata in a
SSD-backed replicated pool. At some point I realized I had made a
mistake in the allocation rule for the "data" part, so I changed the
relevant rule to:

~]$ ceph --cluster cephpa1 osd lspools | grep 85
85 csd-dataonly-ec-pool
~]$ ceph --cluster cephpa1 osd pool get csd-dataonly-ec-pool crush_rule
crush_rule: csd-data-pool

rule csd-data-pool {
          id 5
          type erasure
          min_size 3
          max_size 5
          step set_chooseleaf_tries 5
          step set_choose_tries 100
          step take default class big
          step choose indep 0 type host  <--- this was "osd", before
          step emit
}

Can you please share the output of `ceph osd tree` ?

We need to understand why crush is not working any more for your pool.

Sure! Here it is. For historical reasons there are buckets of type "storage" which however you can safely ignore as they are no longer present in any crush_rule. Please also don't worry about the funny weights, as I am preparing for hardware replacemente and am freeing up space.

~]# ceph --cluster cephpa1 osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 125.79997 root default -8 42.70000 rack rack1 -53 1.64999 host r1srv05 -54 0.39999 storage r1srv05sto1 57 big 0.09999 osd.57 up 1.00000 1.00000 67 big 0.09999 osd.67 up 1.00000 1.00000 72 big 0.09999 osd.72 up 1.00000 1.00000 90 big 0.09999 osd.90 up 1.00000 1.00000 -55 1.25000 storage r1srv05sto2 47 hdd 0.25000 osd.47 up 1.00000 1.00000 52 hdd 0.25000 osd.52 up 1.00000 1.00000 61 hdd 0.25000 osd.61 up 1.00000 1.00000 80 hdd 0.25000 osd.80 up 1.00000 1.00000 164 hdd 0.25000 osd.164 up 1.00000 1.00000 -2 1.04999 host r1srv07 -4 1.00000 storage r1srv07sto1 5 hdd 0.25000 osd.5 up 1.00000 1.00000 14 hdd 0.25000 osd.14 up 1.00000 1.00000 18 hdd 0.25000 osd.18 up 1.00000 1.00000 49 hdd 0.25000 osd.49 up 1.00000 1.00000 -5 0.04999 storage r1srv07sto2 99 big 0.00999 osd.99 up 1.00000 1.00000 128 big 0.00999 osd.128 up 1.00000 1.00000 131 big 0.00999 osd.131 up 1.00000 1.00000 134 big 0.00999 osd.134 up 1.00000 1.00000 135 big 0.00999 osd.135 up 1.00000 1.00000 -89 20.00000 host r1srv100 -100 20.00000 storage r1srv100sto1 119 big 1.00000 osd.119 up 1.00000 1.00000 157 big 1.00000 osd.157 up 1.00000 1.00000 175 big 1.00000 osd.175 up 1.00000 1.00000 177 big 1.00000 osd.177 up 1.00000 1.00000 54 hdd 1.00000 osd.54 up 1.00000 1.00000 69 hdd 1.00000 osd.69 up 1.00000 1.00000 160 hdd 1.00000 osd.160 up 1.00000 1.00000 165 hdd 1.00000 osd.165 up 1.00000 1.00000 167 hdd 1.00000 osd.167 up 1.00000 1.00000 169 hdd 1.00000 osd.169 up 1.00000 1.00000 171 hdd 1.00000 osd.171 up 1.00000 1.00000 172 hdd 1.00000 osd.172 up 1.00000 1.00000 173 hdd 1.00000 osd.173 up 1.00000 1.00000 2 ssd 1.00000 osd.2 up 1.00000 1.00000 9 ssd 1.00000 osd.9 up 1.00000 1.00000 17 ssd 1.00000 osd.17 up 1.00000 1.00000 28 ssd 1.00000 osd.28 up 1.00000 1.00000 39 ssd 1.00000 osd.39 up 1.00000 1.00000 178 test 1.00000 osd.178 up 1.00000 1.00000 179 test 1.00000 osd.179 up 1.00000 1.00000 -169 20.00000 host r1sto08 -189 20.00000 storage r1sto08sto1 34 big 1.00000 osd.34 up 1.00000 1.00000 56 big 1.00000 osd.56 up 1.00000 1.00000 64 big 1.00000 osd.64 up 1.00000 1.00000 68 big 1.00000 osd.68 up 1.00000 1.00000 13 hdd 1.00000 osd.13 up 1.00000 1.00000 29 hdd 1.00000 osd.29 up 1.00000 1.00000 31 hdd 1.00000 osd.31 up 1.00000 1.00000 41 hdd 1.00000 osd.41 up 1.00000 1.00000 43 hdd 1.00000 osd.43 up 1.00000 1.00000 46 hdd 1.00000 osd.46 up 1.00000 1.00000 60 hdd 1.00000 osd.60 up 1.00000 1.00000 62 hdd 1.00000 osd.62 up 1.00000 1.00000 63 hdd 1.00000 osd.63 up 1.00000 1.00000 4 ssd 1.00000 osd.4 up 1.00000 1.00000 16 ssd 1.00000 osd.16 up 1.00000 1.00000 21 ssd 1.00000 osd.21 up 1.00000 1.00000 22 ssd 1.00000 osd.22 up 1.00000 1.00000 24 ssd 1.00000 osd.24 up 1.00000 1.00000 36 test 1.00000 osd.36 up 1.00000 1.00000 53 test 1.00000 osd.53 up 1.00000 1.00000 -9 41.04999 rack rack2 -3 0.54999 host r2srv02 -6 0.50000 storage r2srv02sto1 38 big 0.09999 osd.38 up 1.00000 1.00000 73 big 0.09999 osd.73 up 1.00000 1.00000 82 big 0.09999 osd.82 up 1.00000 1.00000 113 big 0.09999 osd.113 up 1.00000 1.00000 120 big 0.09999 osd.120 up 1.00000 1.00000 -7 0.04999 storage r2srv02sto2 132 big 0.00999 osd.132 up 1.00000 1.00000 137 big 0.00999 osd.137 up 1.00000 1.00000 139 big 0.00999 osd.139 up 1.00000 1.00000 141 big 0.00999 osd.141 up 1.00000 1.00000 142 big 0.00999 osd.142 up 1.00000 1.00000 -91 18.00000 host r2srv100 -97 18.00000 storage r2srv100sto1 121 big 0 osd.121 down 0 1.00000 158 big 0 osd.158 up 1.00000 1.00000 159 big 1.00000 osd.159 up 1.00000 1.00000 176 big 1.00000 osd.176 up 1.00000 1.00000 146 hdd 1.00000 osd.146 up 1.00000 1.00000 148 hdd 1.00000 osd.148 up 1.00000 1.00000 149 hdd 1.00000 osd.149 up 1.00000 1.00000 150 hdd 1.00000 osd.150 up 1.00000 1.00000 151 hdd 1.00000 osd.151 up 1.00000 1.00000 152 hdd 1.00000 osd.152 up 1.00000 1.00000 153 hdd 1.00000 osd.153 up 1.00000 1.00000 154 hdd 1.00000 osd.154 up 1.00000 1.00000 155 hdd 1.00000 osd.155 up 1.00000 1.00000 25 ssd 1.00000 osd.25 up 1.00000 1.00000 35 ssd 1.00000 osd.35 up 1.00000 1.00000 51 ssd 1.00000 osd.51 up 1.00000 1.00000 65 ssd 1.00000 osd.65 up 1.00000 1.00000 144 ssd 1.00000 osd.144 up 1.00000 1.00000 161 test 1.00000 osd.161 up 1.00000 1.00000 162 test 1.00000 osd.162 up 1.00000 1.00000 -65 2.50000 host r2srv15 -67 1.25000 storage r2srv15sto1 1 hdd 0.25000 osd.1 up 1.00000 1.00000 6 hdd 0.25000 osd.6 up 1.00000 1.00000 10 hdd 0.25000 osd.10 up 1.00000 1.00000 15 hdd 0.25000 osd.15 up 1.00000 1.00000 19 hdd 0.25000 osd.19 up 1.00000 1.00000 -68 1.25000 storage r2srv15sto2 27 hdd 0.25000 osd.27 up 1.00000 1.00000 33 hdd 0.25000 osd.33 up 1.00000 1.00000 44 hdd 0.25000 osd.44 up 1.00000 1.00000 59 hdd 0.25000 osd.59 up 1.00000 1.00000 79 hdd 0.25000 osd.79 up 1.00000 1.00000 -170 20.00000 host r2sto08 -188 20.00000 storage r2sto08sto1 85 big 1.00000 osd.85 up 1.00000 1.00000 96 big 1.00000 osd.96 up 1.00000 1.00000 102 big 1.00000 osd.102 up 1.00000 1.00000 104 big 1.00000 osd.104 up 1.00000 1.00000 8 hdd 1.00000 osd.8 up 1.00000 1.00000 81 hdd 1.00000 osd.81 up 1.00000 1.00000 83 hdd 1.00000 osd.83 up 1.00000 1.00000 87 hdd 1.00000 osd.87 up 1.00000 1.00000 88 hdd 1.00000 osd.88 up 1.00000 1.00000 92 hdd 1.00000 osd.92 up 1.00000 1.00000 97 hdd 1.00000 osd.97 up 1.00000 1.00000 98 hdd 1.00000 osd.98 up 1.00000 1.00000 100 hdd 1.00000 osd.100 up 1.00000 1.00000 3 ssd 1.00000 osd.3 up 1.00000 1.00000 70 ssd 1.00000 osd.70 up 1.00000 1.00000 74 ssd 1.00000 osd.74 up 1.00000 1.00000 76 ssd 1.00000 osd.76 up 1.00000 1.00000 78 ssd 1.00000 osd.78 up 1.00000 1.00000 86 test 1.00000 osd.86 up 1.00000 1.00000 94 test 1.00000 osd.94 up 1.00000 1.00000 -10 42.04999 rack rack3 -66 2.50000 host r3srv10 -69 1.25000 storage r3srv10sto1 48 hdd 0.25000 osd.48 up 1.00000 1.00000 50 hdd 0.25000 osd.50 up 1.00000 1.00000 58 hdd 0.25000 osd.58 up 1.00000 1.00000 84 hdd 0.25000 osd.84 up 1.00000 1.00000 101 hdd 0.25000 osd.101 up 1.00000 1.00000 -70 1.25000 storage r3srv10sto2 11 hdd 0.25000 osd.11 up 1.00000 1.00000 20 hdd 0.25000 osd.20 up 1.00000 1.00000 26 hdd 0.25000 osd.26 up 1.00000 1.00000 37 hdd 0.25000 osd.37 up 1.00000 1.00000 42 hdd 0.25000 osd.42 up 1.00000 1.00000 -94 20.00000 host r3srv100 -96 20.00000 storage r3srv100sto1 112 big 1.00000 osd.112 up 1.00000 1.00000 124 big 1.00000 osd.124 up 1.00000 1.00000 156 big 1.00000 osd.156 up 1.00000 1.00000 174 big 1.00000 osd.174 up 1.00000 1.00000 55 hdd 1.00000 osd.55 up 1.00000 1.00000 71 hdd 1.00000 osd.71 up 1.00000 1.00000 89 hdd 1.00000 osd.89 up 1.00000 1.00000 103 hdd 1.00000 osd.103 up 1.00000 1.00000 105 hdd 1.00000 osd.105 up 1.00000 1.00000 107 hdd 1.00000 osd.107 up 1.00000 1.00000 109 hdd 1.00000 osd.109 up 1.00000 1.00000 110 hdd 1.00000 osd.110 up 1.00000 1.00000 111 hdd 1.00000 osd.111 up 1.00000 1.00000 0 ssd 1.00000 osd.0 up 1.00000 1.00000 23 ssd 1.00000 osd.23 up 1.00000 1.00000 30 ssd 1.00000 osd.30 up 1.00000 1.00000 40 ssd 1.00000 osd.40 up 1.00000 1.00000 45 ssd 1.00000 osd.45 up 1.00000 1.00000 126 test 1.00000 osd.126 up 1.00000 1.00000 127 test 1.00000 osd.127 up 1.00000 1.00000 -11 0.54999 host r3srv15 -12 0.50000 storage r3srv15sto1 77 big 0.09999 osd.77 up 1.00000 1.00000 91 big 0.09999 osd.91 up 1.00000 1.00000 95 big 0.09999 osd.95 up 1.00000 1.00000 140 big 0.09999 osd.140 up 1.00000 1.00000 143 big 0.09999 osd.143 up 1.00000 1.00000 -13 0.04999 storage r3srv15sto2 66 big 0.00999 osd.66 up 1.00000 1.00000 75 big 0.00999 osd.75 up 1.00000 1.00000 114 big 0.00999 osd.114 up 1.00000 1.00000 118 big 0.00999 osd.118 up 1.00000 1.00000 125 big 0.00999 osd.125 up 1.00000 1.00000 -171 19.00000 host r3sto08 -187 19.00000 storage r3sto08sto1 123 big 1.00000 osd.123 up 1.00000 1.00000 145 big 0 osd.145 down 0 1.00000 168 big 1.00000 osd.168 up 1.00000 1.00000 170 big 1.00000 osd.170 up 1.00000 1.00000 12 hdd 1.00000 osd.12 up 1.00000 1.00000 117 hdd 1.00000 osd.117 up 1.00000 1.00000 122 hdd 1.00000 osd.122 up 1.00000 1.00000 130 hdd 1.00000 osd.130 up 1.00000 1.00000 133 hdd 1.00000 osd.133 up 1.00000 1.00000 136 hdd 1.00000 osd.136 up 1.00000 1.00000 147 hdd 1.00000 osd.147 up 1.00000 1.00000 163 hdd 1.00000 osd.163 up 1.00000 1.00000 166 hdd 1.00000 osd.166 up 1.00000 1.00000 7 ssd 1.00000 osd.7 up 1.00000 1.00000 106 ssd 1.00000 osd.106 up 1.00000 1.00000 108 ssd 1.00000 osd.108 up 1.00000 1.00000 115 ssd 1.00000 osd.115 up 1.00000 1.00000 116 ssd 1.00000 osd.116 up 1.00000 1.00000 129 test 1.00000 osd.129 up 1.00000 1.00000 138 test 1.00000 osd.138 up 1.00000 1.00000


At the time I changed the rule, there was no 'down' PG, all PGs in the
cluster were 'active' plus possibly some other state (remapped,
degraded, whatever) as I had added some new disk servers few days before.

Never make crush rule changes when any PG is degraded, remapped, or whatever!
They must all be active+clean to consider big changes like injecting a
new crush rule!!

Ok, now I think I learned it. In my mind it was a sort of optimization: as I was moving stuff around due to the additional servers, why not at the same time update the crush rule?
Will remember the lesson for the future.

  Thanks!

			Fulvio

--
Fulvio Galeazzi
GARR-CSD Department
tel.: +39-334-6533-250
skype: fgaleazzi70
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux