Re: how to re-add a deleted osd device as a osd with data

lin zhou <hnuzhoulin2@xxxxxxxxx> · Tue, 29 Mar 2016 14:00:44 +0800

Hi,Christian.
When I re-add these OSD(0,3,9,12,15),the high latency occur again.the
default reweight of these OSD is 0.0

root@node-65:~# ceph osd tree
# id    weight  type name       up/down reweight
-1      103.7   root default
-2      8.19            host node-65
18      2.73                    osd.18  up      1
21      0                       osd.21  up      1
24      2.73                    osd.24  up      1
27      2.73                    osd.27  up      1
30      0                       osd.30  up      1
33      0                       osd.33  up      1
0       0                       osd.0   up      1
3       0                       osd.3   up      1
6       0                       osd.6   down    0
9       0                       osd.9   up      1
12      0                       osd.12  up      1
15      0                       osd.15  up      1

ceph osd perf:
    0                  9825                10211
    3                  9398                 9775
    9                 35852                36904
   12                 24716                25626
   15                 18893                19633

but iostat of these device is empty.
smartctl say nothing error found in these OSD device.

2016-03-29 13:22 GMT+08:00 lin zhou <hnuzhoulin2@xxxxxxxxx>:
> Thanks.I try this method just like ceph document say.
> But I just test osd.6 in this way,and the leveldb of osd.6 is
> broken.so it can not start.
>
> When I try this for other osd,it works.
>
> 2016-03-29 8:22 GMT+08:00 Christian Balzer <chibi@xxxxxxx>:
>> On Mon, 28 Mar 2016 18:36:14 +0800 lin zhou wrote:
>>
>>> > Hello,
>>> >
>>> > On Sun, 27 Mar 2016 13:41:57 +0800 lin zhou wrote:
>>> >
>>> > > Hi,guys.
>>> > > some days ago,one osd have a large latency seeing in ceph osd
>>> > > perf.and this device make this node a high cpu await.
>>> > The thing to do at that point would have been look at things with atop
>>> > or iostat to verify that it was the device itself that was slow and not
>>> > because it was genuinely busy due to uneven activity maybe.
>>> > As well as a quick glance at SMART of course.
>>>
>>> Thanks.I will follow this when I face this problem next time.
>>>
>>> > > So,I delete this osd ad then check this device.
>>> > If that device (HDD, SSD, which model?) slowed down your cluster, you
>>> > should not have deleted it.
>>> > The best method would have been to set your cluster to noout and stop
>>> > that specific OSD.
>>> >
>>> > When you say "delete", what exact steps did you take?
>>> > Did this include removing it from the crush map?
>>>
>>> Yes,I delete it from crush map.delete its auth,and rm osd.
>>>
>>
>> Google is your friend, if you deleted it like in the link below you should
>> be be able to re-add it the same way:
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-June/002345.html
>>
>> Christian
>>
>>> > > But nothing error found.
>>> > >
>>> > > And now I want to re-add this device into cluster with it's data.
>>> > >
>>> > All the data was already replicated elsewhere if you deleted/removed
>>> > the OSD, you're likely not going to save much if any data movement by
>>> > re-adding it.
>>>
>>> Yes,the cluster finished rebalance.but I face a problem of one unfound
>>> object. And in the output of pg query in recovery_state say,this osd is
>>> down,but other odds are ok.
>>> So I want to recover this osd to recover this unfound object.
>>>
>>> and mark_unfound_lost revert/delete do not work:
>>> Error EINVAL: pg has 1 unfound objects but we haven't probed all sources,
>>>
>>> detail see:
>>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008452.html
>>>
>>> Thanks again.
>>>
>>> > >
>>> > > I try to using ceph-osd to add it,but it can not start.log are paste
>>> > > in : https://gist.github.com/hnuzhoulin/836f9e633b90041e89ad
>>> > >
>>> > > so what's the recommend steps.
>>> > That depends on how you deleted it, but at this point your data is
>>> > likely to be mostly stale anyway, so I'd start from scratch.
>>>
>>> > Christian
>>> > --
>>> > Christian Balzer Network/Systems Engineer
>>> > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications
>>> > http://www.gol.com/
>>> >
>>>
>>
>>
>> --
>> Christian Balzer        Network/Systems Engineer
>> chibi@xxxxxxx           Global OnLine Japan/Rakuten Communications
>> http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com