My Ceph was hung, and "osd.21
172.30.5.2:6870/8047 879 : [ERR] 6.9d8 has 4 objects unfound and
apparently lost".
After I restart all ceph-data nodes, I can't start osd.21, have
many logs about pg 6.9d8 as:
-440> 2014-10-25 19:28:17.468161 7fec5731d700 5 -- op tracker
-- seq: 3083, time: 2014-10-25 19:28:17.468161, event: reached_pg,
op: MOSDPGPus
h(6.9d8 102856 [PushOp(e8de59d8/rbd_data.4d091f7304c844.000000000000e871/head//6,
version: 102853'7800592, data_included: [0~4194304], data_size:
4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size:
2, recovery_info:
ObjectRecoveryInfo(e8de59d8/rbd_data.4d091f7304c844.00000000
0000e871/head//6@102853'7800592, copy_subset: [0~4194304],
clone_subset: {}), after_progress: ObjectRecoveryProgress(!first,
data_recovered_to:41
94304, data_complete:true, omap_recovered_to:,
omap_complete:true), before_progress:
ObjectRecoveryProgress(first, data_recovered_to:0, data_comp
lete:false, omap_recovered_to:, omap_complete:false))])
I think having some error objects. What'm I must do?,please!
Thanks!
--
Tuan
HaNoi-VietNam
On 10/25/2014 03:01 PM, Ta Ba Tuan wrote:
I send some related bugs:
(osd.21 not be able started)
-8705> 2014-10-25 14:41:04.345727 7f12bac2f700 5 osd.21
pg_epoch: 102843 pg[6.5e1( v 102843'11832159
(102377'11822991,102843'11832159] lb
c4951de1/rbd_data.3955c5cdbb2ea.00000000000405f0/head//6
local-les=101780 n=4719 ec=164 les/c 102841/102838
102840/102840/102477) [40,0,21]/[40,0,60] r=-1 lpr=102840
pi=31832-102839/230 luod=0'0 crt=102843'11832157 lcod
102843'11832158 active+remapped] exit
Started/ReplicaActive/RepNotRecovering 0.000170 1 0.000296
-1637> 2014-10-25 14:41:14.326580 7f12bac2f700 5 osd.21
pg_epoch: 102843 pg[2.23b( v 102839'91984
(91680'88526,102839'91984] local-les=102841 n=85 ec=25000 les/c
102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840
pi=100114-102839/50 luod=0'0 crt=102839'91984 active] enter
Started/ReplicaActive/RepNotRecovering
-437> 2014-10-25 14:41:15.042174 7f12ba42e700 5 osd.21
pg_epoch: 102843 pg[27.239( v 102808'38419
(81621'35409,102808'38419] local-les=102841 n=23 ec=25085 les/c
102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840
pi=100252-102839/53 luod=0'0 crt=102808'38419 active] enter
Started/ReplicaActive/RepNotRecovering
Thanks!
On 10/25/2014 11:26 AM, Ta Ba Tuan wrote:
Hi Craig, Thanks for replying.
When i started that osd, Ceph Log from "ceph -w" warns pgs
7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log.
Those pgs are "active+degraded" state.
#ceph pg map 7.9d8
osdmap e102808 pg 7.9d8 (7.9d8) -> up [93,49] acting
[93,49] (When start osd.21 then pg 7.9d8 and three remain
pgs to changed to state "active+recovering") . osd.21 still
down after following logs:
2014-10-25 10:57:48.415920 osd.21 [WRN] slow request 30.835731
seconds old, received at 2014-10-25 10:57:17.580013:
MOSDPGPush(7.9d8 102803 [Push
Op(e13589d8/rbd_data.4b843b2ae8944a.0000000000000c00/head//6,
version: 102798'7794851, data_included: [0~4194304],
data_size: 4194304, omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2,
recovery_info:
ObjectRecoveryInfo(e13589d8/rbd_data.4b843b2ae8944a.0000000000000c00/head//6@102
798'7794851, copy_subset: [0~4194304], clone_subset: {}),
after_progress: ObjectRecoveryProgress(!first,
data_recovered_to:4194304, data_complete
:true, omap_recovered_to:, omap_complete:true),
before_progress: ObjectRecoveryProgress(first,
data_recovered_to:0, data_complete:false, omap_rec
overed_to:, omap_complete:false))]) v2 currently no flag
points reached
2014-10-25 10:57:48.415927 osd.21 [WRN] slow request 30.275588
seconds old, received at 2014-10-25 10:57:18.140156:
MOSDPGPush(23.596 102803 [Pus
hOp(4ca76d96/rbd_data.5dd32f2ae8944a.0000000000000385/head//24,
version: 102798'295732, data_included: [0~4194304], data_size:
4194304, omap_head
er_size: 0, omap_entries_size: 0, attrset_size: 2,
recovery_info:
ObjectRecoveryInfo(4ca76d96/rbd_data.5dd32f2ae8944a.0000000000000385/head//24@1
02798'295732, copy_subset: [0~4194304], clone_subset: {}),
after_progress: ObjectRecoveryProgress(!first,
data_recovered_to:4194304, data_complet
e:true, omap_recovered_to:, omap_complete:true),
before_progress: ObjectRecoveryProgress(first,
data_recovered_to:0, data_complete:false, omap_re
covered_to:, omap_complete:false))]) v2 currently no flag
points reached
2014-10-25 10:57:48.415910 osd.21 [WRN] slow request 30.860696
seconds old, received at 2014-10-25 10:57:17.555048:
MOSDPGPush(23.9c6 102803 [Pus
hOp(efdde9c6/rbd_data.5b64062ae8944a.0000000000000b15/head//24,
version: 102798'66056, data_included: [0~4194304], data_size:
4194304, omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2,
recovery_info:
ObjectRecoveryInfo(efdde9c6/rbd_data.5b64062ae8944a.0000000000000b15/head//24@10
2798'66056, copy_subset: [0~4194304], clone_subset: {}),
after_progress: ObjectRecoveryProgress(!first,
data_recovered_to:4194304, data_complete:
true, omap_recovered_to:, omap_complete:true),
before_progress: ObjectRecoveryProgress(first,
data_recovered_to:0, data_complete:false, omap_reco
vered_to:, omap_complete:false))]) v2 currently no flag points
reached
2014-10-25 10:57:58.418847 osd.21 [WRN] 26 slow requests, 1
included below; oldest blocked for > 54.967456 secs
2014-10-25 10:57:58.418859 osd.21 [WRN] slow request 30.967294
seconds old, received at 2014-10-25 10:57:27.451488:
MOSDPGPush(23.63c 102803 [Pus
hOp(40e4b63c/rbd_data.57ed612ae8944a.0000000000000c00/head//24,
version: 102748'145637, data_included: [0~4194304], data_size:
4194304, omap_head
er_size: 0, omap_entries_size: 0, attrset_size: 2,
recovery_info:
ObjectRecoveryInfo(40e4b63c/rbd_data.57ed612ae8944a.0000000000000c00/head//24@1
02748'145637, copy_subset: [0~4194304], clone_subset: {}),
after_progress: ObjectRecoveryProgress(!first,
data_recovered_to:4194304, data_complet
e:true, omap_recovered_to:, omap_complete:true),
before_progress: ObjectRecoveryProgress(first,
data_recovered_to:0, data_complete:false, omap_re
covered_to:, omap_complete:false))]) v2 currently no flag
points reached
Thanks!
--
Tuan
HaNoi-VietNam
On 10/25/2014 05:07 AM, Craig Lewis wrote:
It looks like you're running into http://tracker.ceph.com/issues/5699
You're running 0.80.7, which has a fix for that bug.
From my reading of the code, I believe the fix only
prevents the issue from occurring. It doesn't work around
or repair bad snapshots created on older versions of Ceph.
Were any of the snapshots you're removing up created on
older versions of Ceph? If they were all created on
Firefly, then you should open a new tracker issue, and try
to get some help on IRC or the developers mailing list.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|