Re: Can't start osd- one osd alway be down.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Craig, Thanks for replying.
When i started that osd, Ceph Log from "ceph -w" warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log.

Those pgs are "active+degraded" state.
#ceph pg map 7.9d8
osdmap e102808 pg 7.9d8 (7.9d8) -> up [93,49] acting [93,49]  (When start osd.21 then pg 7.9d8 and three remain pgs  to changed to state "active+recovering") . osd.21 still down after following logs:


2014-10-25 10:57:48.415920 osd.21 [WRN] slow request 30.835731 seconds old, received at 2014-10-25 10:57:17.580013: MOSDPGPush(7.9d8 102803 [Push
Op(e13589d8/rbd_data.4b843b2ae8944a.0000000000000c00/head//6, version: 102798'7794851, data_included: [0~4194304], data_size: 4194304, omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(e13589d8/rbd_data.4b843b2ae8944a.0000000000000c00/head//6@102
798'7794851, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete
:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_rec
overed_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:48.415927 osd.21 [WRN] slow request 30.275588 seconds old, received at 2014-10-25 10:57:18.140156: MOSDPGPush(23.596 102803 [Pus
hOp(4ca76d96/rbd_data.5dd32f2ae8944a.0000000000000385/head//24, version: 102798'295732, data_included: [0~4194304], data_size: 4194304, omap_head
er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(4ca76d96/rbd_data.5dd32f2ae8944a.0000000000000385/head//24@1
02798'295732, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complet
e:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re
covered_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:48.415910 osd.21 [WRN] slow request 30.860696 seconds old, received at 2014-10-25 10:57:17.555048: MOSDPGPush(23.9c6 102803 [Pus
hOp(efdde9c6/rbd_data.5b64062ae8944a.0000000000000b15/head//24, version: 102798'66056, data_included: [0~4194304], data_size: 4194304, omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(efdde9c6/rbd_data.5b64062ae8944a.0000000000000b15/head//24@10
2798'66056, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:
true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_reco
vered_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:58.418847 osd.21 [WRN] 26 slow requests, 1 included below; oldest blocked for > 54.967456 secs
2014-10-25 10:57:58.418859 osd.21 [WRN] slow request 30.967294 seconds old, received at 2014-10-25 10:57:27.451488: MOSDPGPush(23.63c 102803 [Pus
hOp(40e4b63c/rbd_data.57ed612ae8944a.0000000000000c00/head//24, version: 102748'145637, data_included: [0~4194304], data_size: 4194304, omap_head
er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(40e4b63c/rbd_data.57ed612ae8944a.0000000000000c00/head//24@1
02748'145637, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complet
e:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re
covered_to:, omap_complete:false))]) v2 currently no flag points reached

Thanks!
--
Tuan
HaNoi-VietNam

On 10/25/2014 05:07 AM, Craig Lewis wrote:
It looks like you're running into http://tracker.ceph.com/issues/5699

You're running 0.80.7, which has a fix for that bug.  From my reading of the code, I believe the fix only prevents the issue from occurring.  It doesn't work around or repair bad snapshots created on older versions of Ceph.

Were any of the snapshots you're removing up created on older versions of Ceph?  If they were all created on Firefly, then you should open a new tracker issue, and try to get some help on IRC or the developers mailing list.
 

On Thu, Oct 23, 2014 at 10:21 PM, Ta Ba Tuan <tuantb@xxxxxxxxxx> wrote:
Dear everyone

I can't start osd.21, (attached log file).
some pgs can't be repair. I'm using replicate 3 for my data pool.
Feel some objects in those pgs be failed,

I tried to delete some data that related above objects, but still not start osd.21
and, removed osd.21, but other osds (eg: osd.86 down, not start osd.86).

Guide me to debug it, please! Thanks!

--
Tuan
Ha Noi - VietNam










_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux