Thank you Eugen so much for your insights! We will definitely apply this method next time. :-) Best Regards, Mary On Sat, Apr 27, 2024 at 1:29 AM Eugen Block <eblock@xxxxxx> wrote: > If the rest of the cluster is healthy and your resiliency is > configured properly, for example to sustain the loss of one or more > hosts at a time, you don’t need to worry about a single disk. Just > take it out and remove it (forcefully) so it doesn’t have any clients > anymore. Ceph will immediately assign different primary OSDs and your > clients will be happy again. ;-) > > Zitat von Mary Zhang <maryzhang0920@xxxxxxxxx>: > > > Thank you Wesley for the clear explanation between the 2 methods! > > The tracker issue you mentioned https://tracker.ceph.com/issues/44400 > talks > > about primary-affinity. Could primary-affinity help remove an OSD with > > hardware issue from the cluster gracefully? > > > > Thanks, > > Mary > > > > > > On Fri, Apr 26, 2024 at 8:43 AM Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx > > > > wrote: > > > >> What you want to do is to stop the OSD (and all its copies of data it > >> contains) by stopping the OSD service immediately. The downside of this > >> approach is it causes the PGs on that OSD to be degraded. But the > upside is > >> the OSD which has bad hardware is immediately no longer participating > in > >> any client IO (the source of your RGW 503s). In this situation the PGs > go > >> into degraded+backfilling > >> > >> The alternative method is to keep the failing OSD up and in the cluster > >> but slowly migrate the data off of it, this would be a long drawn out > >> period of time in which the failing disk would continue to serve client > >> reads and also facilitate backfill but you wouldnt take a copy of the > data > >> out of the cluster and cause degraded PGs. In this scenario the PGs > would > >> be remapped+backfilling > >> > >> I tried to find a way to have your cake and eat it to in relation to > this > >> "predicament" in this tracker issue: > https://tracker.ceph.com/issues/44400 > >> but it was deemed "wont fix". > >> > >> Respectfully, > >> > >> *Wes Dillingham* > >> LinkedIn <http://www.linkedin.com/in/wesleydillingham> > >> wes@xxxxxxxxxxxxxxxxx > >> > >> > >> > >> > >> On Fri, Apr 26, 2024 at 11:25 AM Mary Zhang <maryzhang0920@xxxxxxxxx> > >> wrote: > >> > >>> Thank you Eugen for your warm help! > >>> > >>> I'm trying to understand the difference between 2 methods. > >>> For method 1, or "ceph orch osd rm osd_id", OSD Service — Ceph > >>> Documentation > >>> <https://docs.ceph.com/en/latest/cephadm/services/osd/#remove-an-osd> > >>> says > >>> it involves 2 steps: > >>> > >>> 1. > >>> > >>> evacuating all placement groups (PGs) from the OSD > >>> 2. > >>> > >>> removing the PG-free OSD from the cluster > >>> > >>> For method 2, or the procedure you recommended, Adding/Removing OSDs — > >>> Ceph > >>> Documentation > >>> < > >>> > https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-osds-manual > >>> > > >>> says > >>> "After the OSD has been taken out of the cluster, Ceph begins > rebalancing > >>> the cluster by migrating placement groups out of the OSD that was > removed. > >>> " > >>> > >>> What's the difference between "evacuating PGs" in method 1 and > "migrating > >>> PGs" in method 2? I think method 1 must read the OSD to be removed. > >>> Otherwise, we would not see slow ops warning. Does method 2 not involve > >>> reading this OSD? > >>> > >>> Thanks, > >>> Mary > >>> > >>> On Fri, Apr 26, 2024 at 5:15 AM Eugen Block <eblock@xxxxxx> wrote: > >>> > >>> > Hi, > >>> > > >>> > if you remove the OSD this way, it will be drained. Which means that > >>> > it will try to recover PGs from this OSD, and in case of hardware > >>> > failure it might lead to slow requests. It might make sense to > >>> > forcefully remove the OSD without draining: > >>> > > >>> > - stop the osd daemon > >>> > - mark it as out > >>> > - osd purge <id|osd.id> [--force] [--yes-i-really-mean-it] > >>> > > >>> > Regards, > >>> > Eugen > >>> > > >>> > Zitat von Mary Zhang <maryzhang0920@xxxxxxxxx>: > >>> > > >>> > > Hi, > >>> > > > >>> > > We recently removed an osd from our Cepth cluster. Its underlying > disk > >>> > has > >>> > > a hardware issue. > >>> > > > >>> > > We use command: ceph orch osd rm osd_id --zap > >>> > > > >>> > > During the process, sometimes ceph cluster enters warning state > with > >>> slow > >>> > > ops on this osd. Our rgw also failed to respond to requests and > >>> returned > >>> > > 503. > >>> > > > >>> > > We restarted rgw daemon to make it work again. But the same failure > >>> > occured > >>> > > from time to time. Eventually we noticed that rgw 503 error is a > >>> result > >>> > of > >>> > > osd slow ops. > >>> > > > >>> > > Our cluster has 18 hosts and 210 OSDs. We expect remove an osd with > >>> > > hardware issue won't impact cluster performance & rgw availbility. > Is > >>> our > >>> > > expectation reasonable? What's the best way to handle osd with > >>> hardware > >>> > > failures? > >>> > > > >>> > > Thank you in advance for any comments or suggestions. > >>> > > > >>> > > Best Regards, > >>> > > Mary Zhang > >>> > > _______________________________________________ > >>> > > ceph-users mailing list -- ceph-users@xxxxxxx > >>> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > >>> > > >>> > > >>> > _______________________________________________ > >>> > ceph-users mailing list -- ceph-users@xxxxxxx > >>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx > >>> > > >>> _______________________________________________ > >>> ceph-users mailing list -- ceph-users@xxxxxxx > >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >>> > >> > > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx