Replace OSD drive without remove/re-add OSD

indra@xxxxxxxx (Indra Pramana) · Sun, 4 May 2014 12:11:57 +0800

Dear all,

Would like to share after I tried yesterday, this doesn't work:

> - ceph osd set noout
> - sudo stop ceph-osd id=12
> - Replace the drive, and once done:
> - sudo start ceph-osd id=12
> - ceph osd unset noout

Once drive replaced, we need to ceph-deploy zap and prepare the new drive,
and a new osd number will be assigned. Remapping will start immediately
after the new OSD is in the cluster. Then we can safely remove the old OSD
and unset noout, and wait for recovery completed.

However, the "set noout" in the first place indeed helped to prevent
remapping to take place when we stop the OSD and replace the drive. So it's
advisable to use this feature when replacing drive -- unless if the drive
is already failed and the OSD is already down in the first place.

Thank you.

On Sat, May 3, 2014 at 5:51 PM, Andrey Korolyov <andrey at xdel.ru> wrote:

> On Sat, May 3, 2014 at 4:01 AM, Indra Pramana <indra at sg.or.id> wrote:
> > Sorry forgot to cc the list.
> >
> > On 3 May 2014 08:00, "Indra Pramana" <indra at sg.or.id> wrote:
> >>
> >> Hi Andrey,
> >>
> >> I actually wanted to try this (instead of remove and readd OSD) to avoid
> >> remapping of PGs to other OSDs and the unnecessary I/O load.
> >>
> >> Are you saying that doing this will also trigger remapping? I thought it
> >> will just do recovery to replace missing PGs as a result of the drive
> >> replacement?
> >>
> >> Thank you.
> >>
>
> Yes, remapping will take place, though it is a bit counterintuitive
> and I suspect that the roots are the same as with double data
> placement recalculation with out + rm procedure. Actually Inktank
> people may answer the question with more details I suppose. Also I
> think that preserving of the collections may eliminate remap during
> such kind of refill, though it is not trivial thing to do and I had
> not experimented with this.
>
> >> On 2 May 2014 21:02, "Andrey Korolyov" <andrey at xdel.ru> wrote:
> >>>
> >>> On 05/02/2014 03:27 PM, Indra Pramana wrote:
> >>> > Hi,
> >>> >
> >>> > May I know if it's possible to replace an OSD drive without removing
> /
> >>> > re-adding back the OSD? I want to avoid the time and the excessive
> I/O
> >>> > load which will happen during the recovery process at the time when:
> >>> >
> >>> > - the OSD is removed; and
> >>> > - the OSD is being put back into the cluster.
> >>> >
> >>> > I read David Zafman's comment on this thread, that we can set
> "noout",
> >>> > take OSD "down", replace the drive, and then bring the OSD back "up"
> >>> > and
> >>> > unset "noout".
> >>> >
> >>> > http://www.spinics.net/lists/ceph-users/msg05959.html
> >>> >
> >>> > May I know if it's possible to do this?
> >>> >
> >>> > - ceph osd set noout
> >>> > - sudo stop ceph-osd id=12
> >>> > - Replace the drive, and once done:
> >>> > - sudo start ceph-osd id=12
> >>> > - ceph osd unset noout
> >>> >
> >>> > The cluster was built using ceph-deploy, can we just replace a drive
> >>> > like that without zapping and preparing the disk using ceph-deploy?
> >>> >
> >>>
> >>> There will be absolutely no quirks except continuous remapping with
> >>> peering along entire recovery process. If your cluster may meet this
> >>> well, there is absolutely no problem to go through this flow.
> Otherwise,
> >>> in longer out+in flow, there are only two short intensive
> recalculations
> >>> which can be done at the scheduled time, comparing with peering during
> >>> remap, which can introduce unnecessary I/O spikes.
> >>>
> >>> > Looking forward to your reply, thank you.
> >>> >
> >>> > Cheers.
> >>> >
> >>> >
> >>> > _______________________________________________
> >>> > ceph-users mailing list
> >>> > ceph-users at lists.ceph.com
> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>> >
> >>>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140504/98ee738a/attachment.htm>