Re: Remove an OSD with hardware issue caused rgw 503

Mary Zhang <maryzhang0920@xxxxxxxxx> · Sat, 27 Apr 2024 09:40:17 -0700

Thank you Eugen so much for your insights! We will definitely apply this
method next time. :-)

Best Regards,
Mary

On Sat, Apr 27, 2024 at 1:29 AM Eugen Block <eblock@xxxxxx> wrote:

> If the rest of the cluster is healthy and your resiliency is
> configured properly, for example to sustain the loss of one or more
> hosts at a time, you don’t need to worry about a single disk. Just
> take it out and remove it (forcefully) so it doesn’t have any clients
> anymore. Ceph will immediately assign different primary OSDs and your
> clients will be happy again. ;-)
>
> Zitat von Mary Zhang <maryzhang0920@xxxxxxxxx>:
>
> > Thank you Wesley for the clear explanation between the 2 methods!
> > The tracker issue you mentioned https://tracker.ceph.com/issues/44400
> talks
> > about primary-affinity. Could primary-affinity help remove an OSD with
> > hardware issue from the cluster gracefully?
> >
> > Thanks,
> > Mary
> >
> >
> > On Fri, Apr 26, 2024 at 8:43 AM Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx
> >
> > wrote:
> >
> >> What you want to do is to stop the OSD (and all its copies of data it
> >> contains) by stopping the OSD service immediately. The downside of this
> >> approach is it causes the PGs on that OSD to be degraded. But the
> upside is
> >> the OSD which has bad hardware is immediately no  longer participating
> in
> >> any client IO (the source of your RGW 503s). In this situation the PGs
> go
> >> into degraded+backfilling
> >>
> >> The alternative method is to keep the failing OSD up and in the cluster
> >> but slowly migrate the data off of it, this would be a long drawn out
> >> period of time in which the failing disk would continue to serve client
> >> reads and also facilitate backfill but you wouldnt take a copy of the
> data
> >> out of the cluster and cause degraded PGs. In this scenario the PGs
> would
> >> be remapped+backfilling
> >>
> >> I tried to find a way to have your cake and eat it to in relation to
> this
> >> "predicament" in this tracker issue:
> https://tracker.ceph.com/issues/44400
> >> but it was deemed "wont fix".
> >>
> >> Respectfully,
> >>
> >> *Wes Dillingham*
> >> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
> >> wes@xxxxxxxxxxxxxxxxx
> >>
> >>
> >>
> >>
> >> On Fri, Apr 26, 2024 at 11:25 AM Mary Zhang <maryzhang0920@xxxxxxxxx>
> >> wrote:
> >>
> >>> Thank you Eugen for your warm help!
> >>>
> >>> I'm trying to understand the difference between 2 methods.
> >>> For method 1, or "ceph orch osd rm osd_id", OSD Service — Ceph
> >>> Documentation
> >>> <https://docs.ceph.com/en/latest/cephadm/services/osd/#remove-an-osd>
> >>> says
> >>> it involves 2 steps:
> >>>
> >>>    1.
> >>>
> >>>    evacuating all placement groups (PGs) from the OSD
> >>>    2.
> >>>
> >>>    removing the PG-free OSD from the cluster
> >>>
> >>> For method 2, or the procedure you recommended, Adding/Removing OSDs —
> >>> Ceph
> >>> Documentation
> >>> <
> >>>
> https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-osds-manual
> >>> >
> >>> says
> >>> "After the OSD has been taken out of the cluster, Ceph begins
> rebalancing
> >>> the cluster by migrating placement groups out of the OSD that was
> removed.
> >>> "
> >>>
> >>> What's the difference between "evacuating PGs" in method 1 and
> "migrating
> >>> PGs" in method 2? I think method 1 must read the OSD to be removed.
> >>> Otherwise, we would not see slow ops warning. Does method 2 not involve
> >>> reading this OSD?
> >>>
> >>> Thanks,
> >>> Mary
> >>>
> >>> On Fri, Apr 26, 2024 at 5:15 AM Eugen Block <eblock@xxxxxx> wrote:
> >>>
> >>> > Hi,
> >>> >
> >>> > if you remove the OSD this way, it will be drained. Which means that
> >>> > it will try to recover PGs from this OSD, and in case of hardware
> >>> > failure it might lead to slow requests. It might make sense to
> >>> > forcefully remove the OSD without draining:
> >>> >
> >>> > - stop the osd daemon
> >>> > - mark it as out
> >>> > - osd purge <id|osd.id> [--force] [--yes-i-really-mean-it]
> >>> >
> >>> > Regards,
> >>> > Eugen
> >>> >
> >>> > Zitat von Mary Zhang <maryzhang0920@xxxxxxxxx>:
> >>> >
> >>> > > Hi,
> >>> > >
> >>> > > We recently removed an osd from our Cepth cluster. Its underlying
> disk
> >>> > has
> >>> > > a hardware issue.
> >>> > >
> >>> > > We use command: ceph orch osd rm osd_id --zap
> >>> > >
> >>> > > During the process, sometimes ceph cluster enters warning state
> with
> >>> slow
> >>> > > ops on this osd. Our rgw also failed to respond to requests and
> >>> returned
> >>> > > 503.
> >>> > >
> >>> > > We restarted rgw daemon to make it work again. But the same failure
> >>> > occured
> >>> > > from time to time. Eventually we noticed that rgw 503 error is a
> >>> result
> >>> > of
> >>> > > osd slow ops.
> >>> > >
> >>> > > Our cluster has 18 hosts and 210 OSDs. We expect remove an osd with
> >>> > > hardware issue won't impact cluster performance & rgw availbility.
> Is
> >>> our
> >>> > > expectation reasonable? What's the best way to handle osd with
> >>> hardware
> >>> > > failures?
> >>> > >
> >>> > > Thank you in advance for any comments or suggestions.
> >>> > >
> >>> > > Best Regards,
> >>> > > Mary Zhang
> >>> > > _______________________________________________
> >>> > > ceph-users mailing list -- ceph-users@xxxxxxx
> >>> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>> >
> >>> >
> >>> > _______________________________________________
> >>> > ceph-users mailing list -- ceph-users@xxxxxxx
> >>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>> >
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>
> >>
>
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx