Re: Cluster stuck in failed state after power failure - please help

Jan Pekař - Imatic <jan.pekar@xxxxxxxxx> · Mon, 11 Dec 2017 19:53:41 +0100

After some research it looks, that broken is mainly that dump from

ceph -s

and

ceph pg dump
(no response from that command)

But I can access data on cephfs (data so far tried).

So question is - why is that status stuck, how to fix that? Is there 
some mon? database to reset and refresh that pg data from osd's?

In osd logs I can see, that backfilling is continuing etc, so they have 
correct informations or they are running previous operations before 
power failure.

With regards
Jan Pekar

On 11.12.2017 19:07, Jan Pekař - Imatic wrote:
Hi all,

hope that somebody can help me. I have home ceph installation.
After power failure (it can happen in datacenter also) my ceph booted in 
non-consistent state.

I was backfilling data on one new disk during power failure. First time 
it booted without some OSDs, but I fixed that. Now I have all my OSD's 
running, but cluster state looks like this after some time. :

   cluster:
     id:     2d9bf17f-3d50-4a59-8359-abc8328fe801
     health: HEALTH_WARN
             1 filesystem is degraded
             1 filesystem has a failed mds daemon
             noout,nodeep-scrub flag(s) set
             no active mgr
             317162/12520262 objects misplaced (2.533%)
             Reduced data availability: 52 pgs inactive, 29 pgs down, 1 
pg peering, 1 pg stale
             Degraded data redundancy: 2099528/12520262 objects degraded 
(16.769%), 427 pgs unclean, 368 pgs degraded, 368 pgs undersized
             1/3 mons down, quorum imatic-mce-2,imatic-mce

   services:
     mon: 3 daemons, quorum imatic-mce-2,imatic-mce, out of quorum: obyvak
     mgr: no daemons active
     mds: cephfs-0/1/1 up , 1 failed
     osd: 8 osds: 8 up, 8 in; 61 remapped pgs
          flags noout,nodeep-scrub

   data:
     pools:   8 pools, 896 pgs
     objects: 4446k objects, 9119 GB
     usage:   9698 GB used, 2290 GB / 11988 GB avail
     pgs:     2.455% pgs unknown
              3.348% pgs not active
              2099528/12520262 objects degraded (16.769%)
              317162/12520262 objects misplaced (2.533%)
              371 stale+active+clean
              183 active+undersized+degraded
              154 stale+active+undersized+degraded
              85  active+clean
              22  unknown
              19  stale+down
              14  stale+active+undersized+degraded+remapped+backfill_wait
              13  active+undersized+degraded+remapped+backfill_wait
              10  down
              6   active+clean+remapped
              6   stale+active+clean+remapped
              5   stale+active+remapped+backfill_wait
              2   active+remapped+backfill_wait
              2   stale+active+undersized+degraded+remapped+backfilling
              1   active+undersized+degraded+remapped
              1   active+undersized+degraded+remapped+backfilling
              1   stale+peering
              1   stale+active+clean+scrubbing

There are all OSD's up and running. Before that I completed
ceph osd out
on one of my disk and removed that disk from cluster because I don't 
want to use it anymore. It triggered crush reweight and started to 
rebuild my date. I thinkg that should not put my data in danger even I 
saw that some of my PG's were undersized (why?) - but it is not now the 
think.

When I try to do
ceph pg dump
I have no response.

But ceph osd dump show weird number of osd's on temporary PG's like 
number 2147483647. I thing that there is some problem in some mon or 
other database and peering process cannot complete.

What can I do next? I believed that cluster so much, so I have some data 
I want back. Thank you very much. for help.

My ceph osd dump looks like this:

epoch 29442
fsid 2d9bf17f-3d50-4a59-8359-abc8328fe801
created 2014-12-10 23:00:49.140787
modified 2017-12-11 18:54:01.134091
flags noout,nodeep-scrub,sortbitwise,recovery_deletes
crush_version 14
full_ratio 0.97
backfillfull_ratio 0.91
nearfull_ratio 0.9
require_min_compat_client firefly
min_compat_client firefly
require_osd_release luminous
pool 0 'data' replicated size 2 min_size 1 crush_rule 0 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 27537 flags hashpspool 
crash_replay_interval 45 min_read_recency_for_promote 1 
min_write_recency_for_promote 1 stripe_width 0 application cephfs
pool 1 'metadata' replicated size 3 min_size 1 crush_rule 1 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 27537 flags hashpspool 
min_read_recency_for_promote 1 min_write_recency_for_promote 1 
stripe_width 0 application cephfs
pool 2 'rbd' replicated size 2 min_size 1 crush_rule 0 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 28088 flags hashpspool 
min_read_recency_for_promote 1 min_write_recency_for_promote 1 
stripe_width 0 application rbd
         removed_snaps [1~5]
pool 3 'nonreplicated' replicated size 1 min_size 1 crush_rule 2 
object_hash rjenkins pg_num 192 pgp_num 192 last_change 27537 flags 
hashpspool min_read_recency_for_promote 1 min_write_recency_for_promote 
1 stripe_width 0 application cephfs
pool 4 'replicated' replicated size 2 min_size 1 crush_rule 0 
object_hash rjenkins pg_num 192 pgp_num 192 last_change 27537 lfor 
17097/17097 flags hashpspool min_read_recency_for_promote 1 
min_write_recency_for_promote 1 stripe_width 0 application cephfs
pool 10 'erasure_3_1' erasure size 4 min_size 3 crush_rule 3 object_hash 
rjenkins pg_num 128 pgp_num 128 last_change 27537 lfor 9127/9127 flags 
hashpspool tiers 11 read_tier 11 write_tier 11 
min_write_recency_for_promote 1 stripe_width 4128 application cephfs
pool 11 'erasure_3_1_hot' replicated size 2 min_size 1 crush_rule 1 
object_hash rjenkins pg_num 128 pgp_num 128 last_change 9910 flags 
hashpspool,incomplete_clones tier_of 10 cache_mode writeback 
target_bytes 5368709120 hit_set bloom{false_positive_probability: 0.05, 
target_size: 0, seed: 0} 0s x0 decay_rate 0 search_last_n 1 
min_write_recency_for_promote 1 stripe_width 0
pool 12 'test' replicated size 1 min_size 1 crush_rule 4 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 27463 flags hashpspool 
stripe_width 0
max_osd 8
osd.0 up   in  weight 1 up_from 29416 up_thru 29433 down_at 29407 
last_clean_interval [29389,29406) 192.168.11.165:6800/9273 
192.168.11.165:6801/9273 192.168.11.165:6802/9273 
192.168.11.165:6803/9273 exists,up 630fe0dc-9ec0-456a-bf15-51d6d3ba462d
osd.1 up   in  weight 1 up_from 29422 up_thru 29437 down_at 29407 
last_clean_interval [29390,29406) 192.168.11.165:6816/9336 
192.168.11.165:6817/9336 192.168.11.165:6818/9336 
192.168.11.165:6819/9336 exists,up ef583c8d-171f-47c4-8a9e-e9eb913cb272
osd.2 up   in  weight 1 up_from 29409 up_thru 29433 down_at 29407 
last_clean_interval [29389,29406) 192.168.11.165:6804/9285 
192.168.11.165:6805/9285 192.168.11.165:6806/9285 
192.168.11.165:6807/9285 exists,up 1de26ef5-319d-426e-ad75-65aedbbd0328
osd.3 up   in  weight 1 up_from 29430 up_thru 29439 down_at 29410 
last_clean_interval [29391,29406) 192.168.11.165:6824/12146 
192.168.11.165:6825/12146 192.168.11.165:6826/12146 
192.168.11.165:6827/12146 exists,up 5b63a084-cb0c-4e6a-89c1-9d2fc70cea02
osd.4 up   in  weight 1 up_from 29442 up_thru 0 down_at 29347 
last_clean_interval [29317,29343) 192.168.11.165:6828/15193 
192.168.11.165:6829/15193 192.168.11.165:6830/15193 
192.168.11.165:6831/15193 exists,up ee9d758d-f2df-41b6-9320-ce89f54c116b
osd.5 up   in  weight 1 up_from 29414 up_thru 29431 down_at 29413 
last_clean_interval [29390,29406) 192.168.11.165:6812/9321 
192.168.11.165:6813/9321 192.168.11.165:6814/9321 
192.168.11.165:6815/9321 exists,up d1077d42-2c92-4afd-a11e-02fdd59b393b
osd.6 up   in  weight 1 up_from 29413 up_thru 29433 down_at 29407 
last_clean_interval [29390,29406) 192.168.11.165:6820/9345 
192.168.11.165:6821/9345 192.168.11.165:6822/9345 
192.168.11.165:6823/9345 exists,up f55da9e5-0c03-43fa-af59-56add845c706
osd.7 up   in  weight 1 up_from 29422 up_thru 29433 down_at 29407 
last_clean_interval [29389,29406) 192.168.11.165:6808/9309 
192.168.11.165:6809/9309 192.168.11.165:6810/9309 
192.168.11.165:6811/9309 exists,up 1e75647b-a1fc-4672-957f-ce5c2b0f4a43
pg_temp 0.0 [0,5]
pg_temp 0.2c [0,6]
pg_temp 0.31 [7,5]
pg_temp 0.33 [0,2]
pg_temp 1.3 [1,6,5]
pg_temp 1.1d [1,3,2]
pg_temp 1.27 [6,5,2]
pg_temp 1.2c [3,1,5]
pg_temp 1.36 [6,3,2]
pg_temp 1.37 [6,1,5]
pg_temp 1.38 [6,3,1]
pg_temp 1.3f [3,5,1]
pg_temp 4.0 [3,6]
pg_temp 4.4 [0,6]
pg_temp 4.9 [0,7]
pg_temp 4.13 [6,3]
pg_temp 4.1a [6,5]
pg_temp 4.41 [6,2]
pg_temp 4.4b [0,1]
pg_temp 4.5c [6,0]
pg_temp 4.76 [0,6]
pg_temp 4.87 [0,7]
pg_temp 4.9d [0,3]
pg_temp 4.a9 [6,7]
pg_temp 10.1 [0,7,2,1]
pg_temp 10.2 [3,0,1,7]
pg_temp 10.5 [2,6,3,0]
pg_temp 10.7 [6,7,0,3]
pg_temp 10.a [0,3,5,6]
pg_temp 10.c [7,6,5,0]
pg_temp 10.f [7,3,6,0]
pg_temp 10.1c [6,1,0,3]
pg_temp 10.25 [0,7,5,3]
pg_temp 10.26 [3,2,5,1]
pg_temp 10.29 [7,5,2,0]
pg_temp 10.2f [1,5,7,0]
pg_temp 10.3b [7,3,0,6]
pg_temp 10.41 [0,1,3,7]
pg_temp 10.47 [3,0,7,5]
pg_temp 10.4c [7,3,5,1]
pg_temp 10.51 [1,7,0,2]
pg_temp 10.54 [3,0,7,5]
pg_temp 10.55 [7,1,0,3]
pg_temp 10.5a [7,0,3,2]
pg_temp 10.5b [2147483647,0,5,6]
pg_temp 10.5e [7,2,3,0]
pg_temp 10.5f [6,3,2,1]
pg_temp 10.63 [7,5,3,2]
pg_temp 10.64 [3,6,7,1]
pg_temp 10.66 [0,1,5,7]
pg_temp 10.6c [7,3,2,0]
pg_temp 10.6f [6,3,2,7]
pg_temp 10.70 [0,7,6,1]
pg_temp 10.72 [7,6,1,3]
pg_temp 10.73 [6,2147483647,7,0]
pg_temp 10.74 [7,1,6,3]
pg_temp 10.7c [0,6,1,7]
pg_temp 10.7f [0,7,1,2]
pg_temp 11.27 [6,5]
pg_temp 11.38 [3,6]
pg_temp 11.78 [3,1]

Thank you all for help. It is important for me.
Jan Pekar

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
============
Ing. Jan Pekař
jan.pekar@xxxxxxxxx | +420603811737
----
Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz
============
--
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com