Sorry Frank, I typed the wrong name. On Tue, Apr 30, 2024, 8:51 AM Mary Zhang <maryzhang0920@xxxxxxxxx> wrote: > Sounds good. Thank you Kevin and have a nice day! > > Best Regards, > Mary > > On Tue, Apr 30, 2024, 8:21 AM Frank Schilder <frans@xxxxxx> wrote: > >> I think you are panicking way too much. Chances are that you will never >> need that command, so don't get fussed out by an old post. >> >> Just follow what I wrote and, in the extremely rare case that recovery >> does not complete due to missing information, send an e-mail to this list >> and state that you still have the disk of the down OSD. Someone will send >> you the export/import commands within a short time. >> >> So stop worrying and just administrate your cluster with common storage >> admin sense. >> >> Best regards, >> ================= >> Frank Schilder >> AIT Risø Campus >> Bygning 109, rum S14 >> >> ________________________________________ >> From: Mary Zhang <maryzhang0920@xxxxxxxxx> >> Sent: Tuesday, April 30, 2024 5:00 PM >> To: Frank Schilder >> Cc: Eugen Block; ceph-users@xxxxxxx; Wesley Dillingham >> Subject: Re: Re: Remove an OSD with hardware issue caused >> rgw 503 >> >> Thank you Frank for sharing such valuable experience! I really appreciate >> it. >> We observe similar timelines: it took more than 1 week to drain our OSD. >> Regarding export PGs from failed disk and inject it back to the cluster, >> do you have any documentations? I find this online Ceph.io — Incomplete PGs >> -- OH MY!<https://ceph.io/en/news/blog/2015/incomplete-pgs-oh-my/>, but >> not sure whether it's the standard process. >> >> Thanks, >> Mary >> >> On Tue, Apr 30, 2024 at 3:27 AM Frank Schilder <frans@xxxxxx<mailto: >> frans@xxxxxx>> wrote: >> Hi all, >> >> I second Eugen's recommendation. We have a cluster with large HDD OSDs >> where the following timings are found: >> >> - drain an OSD: 2 weeks. >> - down an OSD and let cluster recover: 6 hours. >> >> The drain OSD procedure is - in my experience - a complete waste of time, >> actually puts your cluster at higher risk of a second failure (its not >> guaranteed that the bad PG(s) is/are drained first) and also screws up all >> sorts of internal operations like scrub etc for an unnecessarily long time. >> The recovery procedure is much faster, because it uses all-to-all recovery >> while drain is limited to no more than max_backfills PGs at a time and your >> broken disk sits much longer in the cluster. >> >> On SSDs the "down OSD"-method shows a similar speed-up factor. >> >> For a security measure, don't destroy the OSD right away, wait for >> recovery to complete and only then destroy the OSD and throw away the disk. >> In case an error occurs during recovery, you can almost always still export >> PGs from a failed disk and inject it back into the cluster. This, however, >> requires to take disks out as soon as they show problems and before they >> fail hard. Keep a little bit of life time to have a chance to recover data. >> Look at the manual of ddrescue why it is important to stop IO from a >> failing disk as soon as possible. >> >> Best regards, >> ================= >> Frank Schilder >> AIT Risø Campus >> Bygning 109, rum S14 >> >> ________________________________________ >> From: Eugen Block <eblock@xxxxxx<mailto:eblock@xxxxxx>> >> Sent: Saturday, April 27, 2024 10:29 AM >> To: Mary Zhang >> Cc: ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>; Wesley Dillingham >> Subject: Re: Remove an OSD with hardware issue caused rgw 503 >> >> If the rest of the cluster is healthy and your resiliency is >> configured properly, for example to sustain the loss of one or more >> hosts at a time, you don’t need to worry about a single disk. Just >> take it out and remove it (forcefully) so it doesn’t have any clients >> anymore. Ceph will immediately assign different primary OSDs and your >> clients will be happy again. ;-) >> >> Zitat von Mary Zhang <maryzhang0920@xxxxxxxxx<mailto: >> maryzhang0920@xxxxxxxxx>>: >> >> > Thank you Wesley for the clear explanation between the 2 methods! >> > The tracker issue you mentioned https://tracker.ceph.com/issues/44400 >> talks >> > about primary-affinity. Could primary-affinity help remove an OSD with >> > hardware issue from the cluster gracefully? >> > >> > Thanks, >> > Mary >> > >> > >> > On Fri, Apr 26, 2024 at 8:43 AM Wesley Dillingham < >> wes@xxxxxxxxxxxxxxxxx<mailto:wes@xxxxxxxxxxxxxxxxx>> >> > wrote: >> > >> >> What you want to do is to stop the OSD (and all its copies of data it >> >> contains) by stopping the OSD service immediately. The downside of this >> >> approach is it causes the PGs on that OSD to be degraded. But the >> upside is >> >> the OSD which has bad hardware is immediately no longer participating >> in >> >> any client IO (the source of your RGW 503s). In this situation the PGs >> go >> >> into degraded+backfilling >> >> >> >> The alternative method is to keep the failing OSD up and in the cluster >> >> but slowly migrate the data off of it, this would be a long drawn out >> >> period of time in which the failing disk would continue to serve client >> >> reads and also facilitate backfill but you wouldnt take a copy of the >> data >> >> out of the cluster and cause degraded PGs. In this scenario the PGs >> would >> >> be remapped+backfilling >> >> >> >> I tried to find a way to have your cake and eat it to in relation to >> this >> >> "predicament" in this tracker issue: >> https://tracker.ceph.com/issues/44400 >> >> but it was deemed "wont fix". >> >> >> >> Respectfully, >> >> >> >> *Wes Dillingham* >> >> LinkedIn <http://www.linkedin.com/in/wesleydillingham> >> >> wes@xxxxxxxxxxxxxxxxx<mailto:wes@xxxxxxxxxxxxxxxxx> >> >> >> >> >> >> >> >> >> >> On Fri, Apr 26, 2024 at 11:25 AM Mary Zhang <maryzhang0920@xxxxxxxxx >> <mailto:maryzhang0920@xxxxxxxxx>> >> >> wrote: >> >> >> >>> Thank you Eugen for your warm help! >> >>> >> >>> I'm trying to understand the difference between 2 methods. >> >>> For method 1, or "ceph orch osd rm osd_id", OSD Service — Ceph >> >>> Documentation >> >>> <https://docs.ceph.com/en/latest/cephadm/services/osd/#remove-an-osd> >> >>> says >> >>> it involves 2 steps: >> >>> >> >>> 1. >> >>> >> >>> evacuating all placement groups (PGs) from the OSD >> >>> 2. >> >>> >> >>> removing the PG-free OSD from the cluster >> >>> >> >>> For method 2, or the procedure you recommended, Adding/Removing OSDs — >> >>> Ceph >> >>> Documentation >> >>> < >> >>> >> https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-osds-manual >> >>> > >> >>> says >> >>> "After the OSD has been taken out of the cluster, Ceph begins >> rebalancing >> >>> the cluster by migrating placement groups out of the OSD that was >> removed. >> >>> " >> >>> >> >>> What's the difference between "evacuating PGs" in method 1 and >> "migrating >> >>> PGs" in method 2? I think method 1 must read the OSD to be removed. >> >>> Otherwise, we would not see slow ops warning. Does method 2 not >> involve >> >>> reading this OSD? >> >>> >> >>> Thanks, >> >>> Mary >> >>> >> >>> On Fri, Apr 26, 2024 at 5:15 AM Eugen Block <eblock@xxxxxx<mailto: >> eblock@xxxxxx>> wrote: >> >>> >> >>> > Hi, >> >>> > >> >>> > if you remove the OSD this way, it will be drained. Which means that >> >>> > it will try to recover PGs from this OSD, and in case of hardware >> >>> > failure it might lead to slow requests. It might make sense to >> >>> > forcefully remove the OSD without draining: >> >>> > >> >>> > - stop the osd daemon >> >>> > - mark it as out >> >>> > - osd purge <id|osd.id<http://osd.id>> [--force] >> [--yes-i-really-mean-it] >> >>> > >> >>> > Regards, >> >>> > Eugen >> >>> > >> >>> > Zitat von Mary Zhang <maryzhang0920@xxxxxxxxx<mailto: >> maryzhang0920@xxxxxxxxx>>: >> >>> > >> >>> > > Hi, >> >>> > > >> >>> > > We recently removed an osd from our Cepth cluster. Its underlying >> disk >> >>> > has >> >>> > > a hardware issue. >> >>> > > >> >>> > > We use command: ceph orch osd rm osd_id --zap >> >>> > > >> >>> > > During the process, sometimes ceph cluster enters warning state >> with >> >>> slow >> >>> > > ops on this osd. Our rgw also failed to respond to requests and >> >>> returned >> >>> > > 503. >> >>> > > >> >>> > > We restarted rgw daemon to make it work again. But the same >> failure >> >>> > occured >> >>> > > from time to time. Eventually we noticed that rgw 503 error is a >> >>> result >> >>> > of >> >>> > > osd slow ops. >> >>> > > >> >>> > > Our cluster has 18 hosts and 210 OSDs. We expect remove an osd >> with >> >>> > > hardware issue won't impact cluster performance & rgw >> availbility. Is >> >>> our >> >>> > > expectation reasonable? What's the best way to handle osd with >> >>> hardware >> >>> > > failures? >> >>> > > >> >>> > > Thank you in advance for any comments or suggestions. >> >>> > > >> >>> > > Best Regards, >> >>> > > Mary Zhang >> >>> > > _______________________________________________ >> >>> > > ceph-users mailing list -- ceph-users@xxxxxxx<mailto: >> ceph-users@xxxxxxx> >> >>> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto: >> ceph-users-leave@xxxxxxx> >> >>> > >> >>> > >> >>> > _______________________________________________ >> >>> > ceph-users mailing list -- ceph-users@xxxxxxx<mailto: >> ceph-users@xxxxxxx> >> >>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto: >> ceph-users-leave@xxxxxxx> >> >>> > >> >>> _______________________________________________ >> >>> ceph-users mailing list -- ceph-users@xxxxxxx<mailto: >> ceph-users@xxxxxxx> >> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto: >> ceph-users-leave@xxxxxxx> >> >>> >> >> >> >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto: >> ceph-users-leave@xxxxxxx> >> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx