Hi all,
hope that somebody can help me. I have home ceph installation.
After power failure (it can happen in datacenter also) my ceph booted in
non-consistent state.
I was backfilling data on one new disk during power failure. First time
it booted without some OSDs, but I fixed that. Now I have all my OSD's
running, but cluster state looks like this after some time. :
cluster:
id: 2d9bf17f-3d50-4a59-8359-abc8328fe801
health: HEALTH_WARN
1 filesystem is degraded
1 filesystem has a failed mds daemon
noout,nodeep-scrub flag(s) set
no active mgr
317162/12520262 objects misplaced (2.533%)
Reduced data availability: 52 pgs inactive, 29 pgs down, 1
pg peering, 1 pg stale
Degraded data redundancy: 2099528/12520262 objects degraded
(16.769%), 427 pgs unclean, 368 pgs degraded, 368 pgs undersized
1/3 mons down, quorum imatic-mce-2,imatic-mce
services:
mon: 3 daemons, quorum imatic-mce-2,imatic-mce, out of quorum: obyvak
mgr: no daemons active
mds: cephfs-0/1/1 up , 1 failed
osd: 8 osds: 8 up, 8 in; 61 remapped pgs
flags noout,nodeep-scrub
data:
pools: 8 pools, 896 pgs
objects: 4446k objects, 9119 GB
usage: 9698 GB used, 2290 GB / 11988 GB avail
pgs: 2.455% pgs unknown
3.348% pgs not active
2099528/12520262 objects degraded (16.769%)
317162/12520262 objects misplaced (2.533%)
371 stale+active+clean
183 active+undersized+degraded
154 stale+active+undersized+degraded
85 active+clean
22 unknown
19 stale+down
14 stale+active+undersized+degraded+remapped+backfill_wait
13 active+undersized+degraded+remapped+backfill_wait
10 down
6 active+clean+remapped
6 stale+active+clean+remapped
5 stale+active+remapped+backfill_wait
2 active+remapped+backfill_wait
2 stale+active+undersized+degraded+remapped+backfilling
1 active+undersized+degraded+remapped
1 active+undersized+degraded+remapped+backfilling
1 stale+peering
1 stale+active+clean+scrubbing
There are all OSD's up and running. Before that I completed
ceph osd out
on one of my disk and removed that disk from cluster because I don't
want to use it anymore. It triggered crush reweight and started to
rebuild my date. I thinkg that should not put my data in danger even I
saw that some of my PG's were undersized (why?) - but it is not now the
think.
When I try to do
ceph pg dump
I have no response.
But ceph osd dump show weird number of osd's on temporary PG's like
number 2147483647. I thing that there is some problem in some mon or
other database and peering process cannot complete.
What can I do next? I believed that cluster so much, so I have some data
I want back. Thank you very much. for help.
My ceph osd dump looks like this:
epoch 29442
fsid 2d9bf17f-3d50-4a59-8359-abc8328fe801
created 2014-12-10 23:00:49.140787
modified 2017-12-11 18:54:01.134091
flags noout,nodeep-scrub,sortbitwise,recovery_deletes
crush_version 14
full_ratio 0.97
backfillfull_ratio 0.91
nearfull_ratio 0.9
require_min_compat_client firefly
min_compat_client firefly
require_osd_release luminous
pool 0 'data' replicated size 2 min_size 1 crush_rule 0 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 27537 flags hashpspool
crash_replay_interval 45 min_read_recency_for_promote 1
min_write_recency_for_promote 1 stripe_width 0 application cephfs
pool 1 'metadata' replicated size 3 min_size 1 crush_rule 1 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 27537 flags hashpspool
min_read_recency_for_promote 1 min_write_recency_for_promote 1
stripe_width 0 application cephfs
pool 2 'rbd' replicated size 2 min_size 1 crush_rule 0 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 28088 flags hashpspool
min_read_recency_for_promote 1 min_write_recency_for_promote 1
stripe_width 0 application rbd
removed_snaps [1~5]
pool 3 'nonreplicated' replicated size 1 min_size 1 crush_rule 2
object_hash rjenkins pg_num 192 pgp_num 192 last_change 27537 flags
hashpspool min_read_recency_for_promote 1 min_write_recency_for_promote
1 stripe_width 0 application cephfs
pool 4 'replicated' replicated size 2 min_size 1 crush_rule 0
object_hash rjenkins pg_num 192 pgp_num 192 last_change 27537 lfor
17097/17097 flags hashpspool min_read_recency_for_promote 1
min_write_recency_for_promote 1 stripe_width 0 application cephfs
pool 10 'erasure_3_1' erasure size 4 min_size 3 crush_rule 3 object_hash
rjenkins pg_num 128 pgp_num 128 last_change 27537 lfor 9127/9127 flags
hashpspool tiers 11 read_tier 11 write_tier 11
min_write_recency_for_promote 1 stripe_width 4128 application cephfs
pool 11 'erasure_3_1_hot' replicated size 2 min_size 1 crush_rule 1
object_hash rjenkins pg_num 128 pgp_num 128 last_change 9910 flags
hashpspool,incomplete_clones tier_of 10 cache_mode writeback
target_bytes 5368709120 hit_set bloom{false_positive_probability: 0.05,
target_size: 0, seed: 0} 0s x0 decay_rate 0 search_last_n 1
min_write_recency_for_promote 1 stripe_width 0
pool 12 'test' replicated size 1 min_size 1 crush_rule 4 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 27463 flags hashpspool
stripe_width 0
max_osd 8
osd.0 up in weight 1 up_from 29416 up_thru 29433 down_at 29407
last_clean_interval [29389,29406) 192.168.11.165:6800/9273
192.168.11.165:6801/9273 192.168.11.165:6802/9273
192.168.11.165:6803/9273 exists,up 630fe0dc-9ec0-456a-bf15-51d6d3ba462d
osd.1 up in weight 1 up_from 29422 up_thru 29437 down_at 29407
last_clean_interval [29390,29406) 192.168.11.165:6816/9336
192.168.11.165:6817/9336 192.168.11.165:6818/9336
192.168.11.165:6819/9336 exists,up ef583c8d-171f-47c4-8a9e-e9eb913cb272
osd.2 up in weight 1 up_from 29409 up_thru 29433 down_at 29407
last_clean_interval [29389,29406) 192.168.11.165:6804/9285
192.168.11.165:6805/9285 192.168.11.165:6806/9285
192.168.11.165:6807/9285 exists,up 1de26ef5-319d-426e-ad75-65aedbbd0328
osd.3 up in weight 1 up_from 29430 up_thru 29439 down_at 29410
last_clean_interval [29391,29406) 192.168.11.165:6824/12146
192.168.11.165:6825/12146 192.168.11.165:6826/12146
192.168.11.165:6827/12146 exists,up 5b63a084-cb0c-4e6a-89c1-9d2fc70cea02
osd.4 up in weight 1 up_from 29442 up_thru 0 down_at 29347
last_clean_interval [29317,29343) 192.168.11.165:6828/15193
192.168.11.165:6829/15193 192.168.11.165:6830/15193
192.168.11.165:6831/15193 exists,up ee9d758d-f2df-41b6-9320-ce89f54c116b
osd.5 up in weight 1 up_from 29414 up_thru 29431 down_at 29413
last_clean_interval [29390,29406) 192.168.11.165:6812/9321
192.168.11.165:6813/9321 192.168.11.165:6814/9321
192.168.11.165:6815/9321 exists,up d1077d42-2c92-4afd-a11e-02fdd59b393b
osd.6 up in weight 1 up_from 29413 up_thru 29433 down_at 29407
last_clean_interval [29390,29406) 192.168.11.165:6820/9345
192.168.11.165:6821/9345 192.168.11.165:6822/9345
192.168.11.165:6823/9345 exists,up f55da9e5-0c03-43fa-af59-56add845c706
osd.7 up in weight 1 up_from 29422 up_thru 29433 down_at 29407
last_clean_interval [29389,29406) 192.168.11.165:6808/9309
192.168.11.165:6809/9309 192.168.11.165:6810/9309
192.168.11.165:6811/9309 exists,up 1e75647b-a1fc-4672-957f-ce5c2b0f4a43
pg_temp 0.0 [0,5]
pg_temp 0.2c [0,6]
pg_temp 0.31 [7,5]
pg_temp 0.33 [0,2]
pg_temp 1.3 [1,6,5]
pg_temp 1.1d [1,3,2]
pg_temp 1.27 [6,5,2]
pg_temp 1.2c [3,1,5]
pg_temp 1.36 [6,3,2]
pg_temp 1.37 [6,1,5]
pg_temp 1.38 [6,3,1]
pg_temp 1.3f [3,5,1]
pg_temp 4.0 [3,6]
pg_temp 4.4 [0,6]
pg_temp 4.9 [0,7]
pg_temp 4.13 [6,3]
pg_temp 4.1a [6,5]
pg_temp 4.41 [6,2]
pg_temp 4.4b [0,1]
pg_temp 4.5c [6,0]
pg_temp 4.76 [0,6]
pg_temp 4.87 [0,7]
pg_temp 4.9d [0,3]
pg_temp 4.a9 [6,7]
pg_temp 10.1 [0,7,2,1]
pg_temp 10.2 [3,0,1,7]
pg_temp 10.5 [2,6,3,0]
pg_temp 10.7 [6,7,0,3]
pg_temp 10.a [0,3,5,6]
pg_temp 10.c [7,6,5,0]
pg_temp 10.f [7,3,6,0]
pg_temp 10.1c [6,1,0,3]
pg_temp 10.25 [0,7,5,3]
pg_temp 10.26 [3,2,5,1]
pg_temp 10.29 [7,5,2,0]
pg_temp 10.2f [1,5,7,0]
pg_temp 10.3b [7,3,0,6]
pg_temp 10.41 [0,1,3,7]
pg_temp 10.47 [3,0,7,5]
pg_temp 10.4c [7,3,5,1]
pg_temp 10.51 [1,7,0,2]
pg_temp 10.54 [3,0,7,5]
pg_temp 10.55 [7,1,0,3]
pg_temp 10.5a [7,0,3,2]
pg_temp 10.5b [2147483647,0,5,6]
pg_temp 10.5e [7,2,3,0]
pg_temp 10.5f [6,3,2,1]
pg_temp 10.63 [7,5,3,2]
pg_temp 10.64 [3,6,7,1]
pg_temp 10.66 [0,1,5,7]
pg_temp 10.6c [7,3,2,0]
pg_temp 10.6f [6,3,2,7]
pg_temp 10.70 [0,7,6,1]
pg_temp 10.72 [7,6,1,3]
pg_temp 10.73 [6,2147483647,7,0]
pg_temp 10.74 [7,1,6,3]
pg_temp 10.7c [0,6,1,7]
pg_temp 10.7f [0,7,1,2]
pg_temp 11.27 [6,5]
pg_temp 11.38 [3,6]
pg_temp 11.78 [3,1]
Thank you all for help. It is important for me.
Jan Pekar
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com