Replace OSD drive without remove/re-add OSD

indra@xxxxxxxx (Indra Pramana) · Sun, 4 May 2014 21:34:02 +0800

Hi Cedric,

Thanks for your reply.

On Sun, May 4, 2014 at 6:16 PM, Cedric Lemarchand <cedric at yipikai.org>wrote:

>  Hi Indra,
>
> Le 04/05/2014 06:11, Indra Pramana a ?crit :
>
>  Would like to share after I tried yesterday, this doesn't work:
>
> > - ceph osd set noout
> > - sudo stop ceph-osd id=12
> > - Replace the drive, and once done:
> > - sudo start ceph-osd id=12
>
> You said a few lines afterwards that a new OSD number is assigned, so
> there is a typo here : you would use "sudo start ceph-osd id=new_id" (13
> for example). I don't know how you could start the new OSD with the number
> of the old one ? (except if you removed it before, please clarify)
>

I actually wanted to try replacing the drive and start back the OSD without
preparing, therefore I used the same osd number. It didn't work. :)  So I
went ahead with the normal procedure (add new OSD and remove the old OSD).

>   > - ceph osd unset noout
>
>  Once drive replaced, we need to ceph-deploy zap and prepare the new
> drive, and a new osd number will be assigned. Remapping will start
> immediately after the new OSD is in the cluster.
>
> I am not a guru, but I think this is an expected behaviour : noout only
> prevents OSDs to go out, here a new one in added to the cluster so a
> remapping begins.
>
> A thing that come in my mind is that, at the moment of the remapping, you
> still have the old OSD marked IN+DOWN, so the remapping is maybe computed
> with all the OSD marked IN, and a new remapping is computed right after you
> remove the old OSD.
>
> Maybe noin and nobackfill flags could plays well here in order to *freeze*
> every actions related to OSD topology changes until everything has been
> done, so I would try  :
>
> - set noout + noin and/or nobackfill (maybe noin would suffice, not that
> the cluster is naked at this time ...)
> - stop old OSD
> - remove old OSD
> - add new OSD
> - unset flags
>

Thanks for the recommendation! Didn't think about this earlier. Yes it may
be better to remove the old OSD first before adding the new one. It may
utilise the same OSD number as well. Will try this method next time.

Cheers.

>
> Cheers
>
>
>   Then we can safely remove the old OSD and unset noout, and wait for
> recovery completed.
>
>  However, the "set noout" in the first place indeed helped to prevent
> remapping to take place when we stop the OSD and replace the drive. So it's
> advisable to use this feature when replacing drive -- unless if the drive
> is already failed and the OSD is already down in the first place.
>
>  Thank you.
>
>
>
> On Sat, May 3, 2014 at 5:51 PM, Andrey Korolyov <andrey at xdel.ru> wrote:
>
>> On Sat, May 3, 2014 at 4:01 AM, Indra Pramana <indra at sg.or.id> wrote:
>> > Sorry forgot to cc the list.
>> >
>> > On 3 May 2014 08:00, "Indra Pramana" <indra at sg.or.id> wrote:
>> >>
>> >> Hi Andrey,
>> >>
>> >> I actually wanted to try this (instead of remove and readd OSD) to
>> avoid
>> >> remapping of PGs to other OSDs and the unnecessary I/O load.
>> >>
>> >> Are you saying that doing this will also trigger remapping? I thought
>> it
>> >> will just do recovery to replace missing PGs as a result of the drive
>> >> replacement?
>> >>
>> >> Thank you.
>> >>
>>
>>  Yes, remapping will take place, though it is a bit counterintuitive
>> and I suspect that the roots are the same as with double data
>> placement recalculation with out + rm procedure. Actually Inktank
>> people may answer the question with more details I suppose. Also I
>> think that preserving of the collections may eliminate remap during
>> such kind of refill, though it is not trivial thing to do and I had
>> not experimented with this.
>>
>> >> On 2 May 2014 21:02, "Andrey Korolyov" <andrey at xdel.ru> wrote:
>> >>>
>> >>> On 05/02/2014 03:27 PM, Indra Pramana wrote:
>> >>> > Hi,
>> >>> >
>> >>> > May I know if it's possible to replace an OSD drive without
>> removing /
>> >>> > re-adding back the OSD? I want to avoid the time and the excessive
>> I/O
>> >>> > load which will happen during the recovery process at the time when:
>> >>> >
>> >>> > - the OSD is removed; and
>> >>> > - the OSD is being put back into the cluster.
>> >>> >
>> >>> > I read David Zafman's comment on this thread, that we can set
>> "noout",
>> >>> > take OSD "down", replace the drive, and then bring the OSD back "up"
>> >>> > and
>> >>> > unset "noout".
>> >>> >
>> >>> > http://www.spinics.net/lists/ceph-users/msg05959.html
>> >>> >
>> >>> > May I know if it's possible to do this?
>> >>> >
>> >>> > - ceph osd set noout
>> >>> > - sudo stop ceph-osd id=12
>> >>> > - Replace the drive, and once done:
>> >>> > - sudo start ceph-osd id=12
>> >>> > - ceph osd unset noout
>> >>> >
>> >>> > The cluster was built using ceph-deploy, can we just replace a drive
>> >>> > like that without zapping and preparing the disk using ceph-deploy?
>> >>> >
>> >>>
>> >>> There will be absolutely no quirks except continuous remapping with
>> >>> peering along entire recovery process. If your cluster may meet this
>> >>> well, there is absolutely no problem to go through this flow.
>> Otherwise,
>> >>> in longer out+in flow, there are only two short intensive
>> recalculations
>> >>> which can be done at the scheduled time, comparing with peering during
>> >>> remap, which can introduce unnecessary I/O spikes.
>> >>>
>> >>> > Looking forward to your reply, thank you.
>> >>> >
>> >>> > Cheers.
>> >>> >
>> >>> >
>> >>> > _______________________________________________
>> >>> > ceph-users mailing list
>> >>> > ceph-users at lists.ceph.com
>> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>> >
>> >>>
>> >
>>
>
>
>
> _______________________________________________
> ceph-users mailing listceph-users at lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
> C?dric
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140504/de078718/attachment.htm>