Hi,Christian. When I re-add these OSD(0,3,9,12,15),the high latency occur again.the default reweight of these OSD is 0.0 root@node-65:~# ceph osd tree # id weight type name up/down reweight -1 103.7 root default -2 8.19 host node-65 18 2.73 osd.18 up 1 21 0 osd.21 up 1 24 2.73 osd.24 up 1 27 2.73 osd.27 up 1 30 0 osd.30 up 1 33 0 osd.33 up 1 0 0 osd.0 up 1 3 0 osd.3 up 1 6 0 osd.6 down 0 9 0 osd.9 up 1 12 0 osd.12 up 1 15 0 osd.15 up 1 ceph osd perf: 0 9825 10211 3 9398 9775 9 35852 36904 12 24716 25626 15 18893 19633 but iostat of these device is empty. smartctl say nothing error found in these OSD device. 2016-03-29 13:22 GMT+08:00 lin zhou <hnuzhoulin2@xxxxxxxxx>: > Thanks.I try this method just like ceph document say. > But I just test osd.6 in this way,and the leveldb of osd.6 is > broken.so it can not start. > > When I try this for other osd,it works. > > 2016-03-29 8:22 GMT+08:00 Christian Balzer <chibi@xxxxxxx>: >> On Mon, 28 Mar 2016 18:36:14 +0800 lin zhou wrote: >> >>> > Hello, >>> > >>> > On Sun, 27 Mar 2016 13:41:57 +0800 lin zhou wrote: >>> > >>> > > Hi,guys. >>> > > some days ago,one osd have a large latency seeing in ceph osd >>> > > perf.and this device make this node a high cpu await. >>> > The thing to do at that point would have been look at things with atop >>> > or iostat to verify that it was the device itself that was slow and not >>> > because it was genuinely busy due to uneven activity maybe. >>> > As well as a quick glance at SMART of course. >>> >>> Thanks.I will follow this when I face this problem next time. >>> >>> > > So,I delete this osd ad then check this device. >>> > If that device (HDD, SSD, which model?) slowed down your cluster, you >>> > should not have deleted it. >>> > The best method would have been to set your cluster to noout and stop >>> > that specific OSD. >>> > >>> > When you say "delete", what exact steps did you take? >>> > Did this include removing it from the crush map? >>> >>> Yes,I delete it from crush map.delete its auth,and rm osd. >>> >> >> Google is your friend, if you deleted it like in the link below you should >> be be able to re-add it the same way: >> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-June/002345.html >> >> Christian >> >>> > > But nothing error found. >>> > > >>> > > And now I want to re-add this device into cluster with it's data. >>> > > >>> > All the data was already replicated elsewhere if you deleted/removed >>> > the OSD, you're likely not going to save much if any data movement by >>> > re-adding it. >>> >>> Yes,the cluster finished rebalance.but I face a problem of one unfound >>> object. And in the output of pg query in recovery_state say,this osd is >>> down,but other odds are ok. >>> So I want to recover this osd to recover this unfound object. >>> >>> and mark_unfound_lost revert/delete do not work: >>> Error EINVAL: pg has 1 unfound objects but we haven't probed all sources, >>> >>> detail see: >>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008452.html >>> >>> Thanks again. >>> >>> > > >>> > > I try to using ceph-osd to add it,but it can not start.log are paste >>> > > in : https://gist.github.com/hnuzhoulin/836f9e633b90041e89ad >>> > > >>> > > so what's the recommend steps. >>> > That depends on how you deleted it, but at this point your data is >>> > likely to be mostly stale anyway, so I'd start from scratch. >>> >>> > Christian >>> > -- >>> > Christian Balzer Network/Systems Engineer >>> > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications >>> > http://www.gol.com/ >>> > >>> >> >> >> -- >> Christian Balzer Network/Systems Engineer >> chibi@xxxxxxx Global OnLine Japan/Rakuten Communications >> http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com