Re: Can't start osd- one osd alway be down.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



My Ceph was hung, and    "osd.21 172.30.5.2:6870/8047 879 : [ERR] 6.9d8 has 4 objects unfound and apparently lost".

After I restart all ceph-data nodes,  I can't start osd.21, have many logs about pg 6.9d8 as:

 -440> 2014-10-25 19:28:17.468161 7fec5731d700  5 -- op tracker -- seq: 3083, time: 2014-10-25 19:28:17.468161, event: reached_pg, op: MOSDPGPus
h(6.9d8 102856 [PushOp(e8de59d8/rbd_data.4d091f7304c844.000000000000e871/head//6, version: 102853'7800592, data_included: [0~4194304], data_size:
 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(e8de59d8/rbd_data.4d091f7304c844.00000000
0000e871/head//6@102853'7800592, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:41
94304, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_comp
lete:false, omap_recovered_to:, omap_complete:false))])

I think having some error objects. What'm I must do?,please!
Thanks!
--
Tuan
HaNoi-VietNam


On 10/25/2014 03:01 PM, Ta Ba Tuan wrote:
I send some related bugs:
(osd.21 not be able started)

 -8705> 2014-10-25 14:41:04.345727 7f12bac2f700  5 osd.21 pg_epoch: 102843 pg[6.5e1( v 102843'11832159 (102377'11822991,102843'11832159] lb c4951de1/rbd_data.3955c5cdbb2ea.00000000000405f0/head//6 local-les=101780 n=4719 ec=164 les/c 102841/102838 102840/102840/102477) [40,0,21]/[40,0,60] r=-1 lpr=102840 pi=31832-102839/230 luod=0'0 crt=102843'11832157 lcod 102843'11832158 active+remapped] exit Started/ReplicaActive/RepNotRecovering 0.000170 1 0.000296

 -1637> 2014-10-25 14:41:14.326580 7f12bac2f700  5 osd.21 pg_epoch: 102843 pg[2.23b( v 102839'91984 (91680'88526,102839'91984] local-les=102841 n=85 ec=25000 les/c 102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100114-102839/50 luod=0'0 crt=102839'91984 active] enter Started/ReplicaActive/RepNotRecovering

  -437> 2014-10-25 14:41:15.042174 7f12ba42e700  5 osd.21 pg_epoch: 102843 pg[27.239( v 102808'38419 (81621'35409,102808'38419] local-les=102841 n=23 ec=25085 les/c 102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100252-102839/53 luod=0'0 crt=102808'38419 active] enter Started/ReplicaActive/RepNotRecovering

Thanks!


On 10/25/2014 11:26 AM, Ta Ba Tuan wrote:
Hi Craig, Thanks for replying.
When i started that osd, Ceph Log from "ceph -w" warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log.

Those pgs are "active+degraded" state.
#ceph pg map 7.9d8
osdmap e102808 pg 7.9d8 (7.9d8) -> up [93,49] acting [93,49]  (When start osd.21 then pg 7.9d8 and three remain pgs  to changed to state "active+recovering") . osd.21 still down after following logs:


2014-10-25 10:57:48.415920 osd.21 [WRN] slow request 30.835731 seconds old, received at 2014-10-25 10:57:17.580013: MOSDPGPush(7.9d8 102803 [Push
Op(e13589d8/rbd_data.4b843b2ae8944a.0000000000000c00/head//6, version: 102798'7794851, data_included: [0~4194304], data_size: 4194304, omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(e13589d8/rbd_data.4b843b2ae8944a.0000000000000c00/head//6@102
798'7794851, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete
:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_rec
overed_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:48.415927 osd.21 [WRN] slow request 30.275588 seconds old, received at 2014-10-25 10:57:18.140156: MOSDPGPush(23.596 102803 [Pus
hOp(4ca76d96/rbd_data.5dd32f2ae8944a.0000000000000385/head//24, version: 102798'295732, data_included: [0~4194304], data_size: 4194304, omap_head
er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(4ca76d96/rbd_data.5dd32f2ae8944a.0000000000000385/head//24@1
02798'295732, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complet
e:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re
covered_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:48.415910 osd.21 [WRN] slow request 30.860696 seconds old, received at 2014-10-25 10:57:17.555048: MOSDPGPush(23.9c6 102803 [Pus
hOp(efdde9c6/rbd_data.5b64062ae8944a.0000000000000b15/head//24, version: 102798'66056, data_included: [0~4194304], data_size: 4194304, omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(efdde9c6/rbd_data.5b64062ae8944a.0000000000000b15/head//24@10
2798'66056, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:
true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_reco
vered_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:58.418847 osd.21 [WRN] 26 slow requests, 1 included below; oldest blocked for > 54.967456 secs
2014-10-25 10:57:58.418859 osd.21 [WRN] slow request 30.967294 seconds old, received at 2014-10-25 10:57:27.451488: MOSDPGPush(23.63c 102803 [Pus
hOp(40e4b63c/rbd_data.57ed612ae8944a.0000000000000c00/head//24, version: 102748'145637, data_included: [0~4194304], data_size: 4194304, omap_head
er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(40e4b63c/rbd_data.57ed612ae8944a.0000000000000c00/head//24@1
02748'145637, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complet
e:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re
covered_to:, omap_complete:false))]) v2 currently no flag points reached

Thanks!
--
Tuan
HaNoi-VietNam

On 10/25/2014 05:07 AM, Craig Lewis wrote:
It looks like you're running into http://tracker.ceph.com/issues/5699

You're running 0.80.7, which has a fix for that bug.  From my reading of the code, I believe the fix only prevents the issue from occurring.  It doesn't work around or repair bad snapshots created on older versions of Ceph.

Were any of the snapshots you're removing up created on older versions of Ceph?  If they were all created on Firefly, then you should open a new tracker issue, and try to get some help on IRC or the developers mailing list.
 

On Thu, Oct 23, 2014 at 10:21 PM, Ta Ba Tuan <tuantb@xxxxxxxxxx> wrote:
Dear everyone

I can't start osd.21, (attached log file).
some pgs can't be repair. I'm using replicate 3 for my data pool.
Feel some objects in those pgs be failed,

I tried to delete some data that related above objects, but still not start osd.21
and, removed osd.21, but other osds (eg: osd.86 down, not start osd.86).

Guide me to debug it, please! Thanks!

--
Tuan
Ha Noi - VietNam










_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux