Re: Schrödinger's Server

Tim Holloway <timh@xxxxxxxxxxxxx> · Thu, 27 Feb 2025 08:14:50 -0500

System is now stable. The rebalancing was doing what it should,
finished after a couple of hours.

I decided to re-visit the primary problem, which was the infamous "too
many pgs per OSD" and did some tweaks to the pool settings.

It appears that it was actually the auto-sizer that was creating so
many PGs for the biggest pool. I was apparently just misguided in my
expectations based on guidelines that appear to be out of date now.

So I've concluded that there's nothing wrong with how things are laid
out, but I need to increase the monitor alert level for pgs per OSD.

That SHOULD be straightforward, but every time I go looking for info on
how to do it, I either end up with how to set initial PG allocation
when creating pools or how to set the alert level the old way via
ceph.config file options. Very annoying.

On the plus side, once again Ceph has weathered a major disruption
without losing any data. On the minus side, I really wish it wouldn't
simply silently stall out with no reason given while re-balancing. When
I saw it happening (usually after about 15 minutes), I could kick it
back into operation by rebooting a server, but since there was no
single or set of OSDs that it seemed to hang on, I just picked a server
with the most OSDs reported and rebooted that on. I suspect, however,
that any server would have done.

   Thanks,
      Tim

On Thu, 2025-02-27 at 08:28 +0100, Frédéric Nass wrote:
> 
> 
> ----- Le 26 Fév 25, à 16:40, Tim Holloway timh@xxxxxxxxxxxxx a écrit
> :
> 
> > Thanks. I did resolve that problem, though I haven't had a chance
> > to
> > update until now.
> > 
> > I had already attempted to use ceph orch to remove the daemons, but
> > they didn't succeed.
> > 
> > Fortunately, I was able to bring the host online, which allowed the
> > scheduled removals to complete. I confirmed everything was drained,
> > again removed the host from inventory and powered down.
> > 
> > Still got complaints from cephadm about the decommissioned host.
> > 
> > I took a break - impatience and ceph don't mix - and came back to
> > address the next problem. which was lots of stuck PGs. Either
> > because
> > cephadm timed out or something kicked in when I started randomly
> > rebooting OSDs. the host complaint finally disappeared. End of
> > story.
> > 
> > Now for what sent me down that path.
> > 
> > I had 2 OSDs on one server and felt that that was probably not a
> > good
> > idea, so I marked one for deletion. 4 days later it was still in
> > "destroying" state. More concerning, all signs indicated that
> > despite
> > having been reweighted to 0, the "destroying" OSD was still an
> > essential participant and no indication that its PGs were being
> > relocared to active servers. Shutting down the "destroying" OSD
> > would
> > immediately trigger a re-allocation panic, but that didn't clean
> > anything. The re-allocation would proceed at a furious pace, then
> > slowly stall out and hang, and the system was degraded. Restarting
> > the
> > OSD brought the PG inventory back up, but stuff still wasn't moving
> > off
> > the OSD,
> > 
> > Right about that time I decommissioned the questionable host.
> > 
> > Finally, I did a "ceph orch rm osd.x", and terminated the
> > "destroying"
> > permanently, making it finally disappear from the OSD tree list.
> > 
> > I also deleted a number of OSD pools that are (hopefully) not going
> > to
> > be missed.
> > 
> > Kicking and randomly repeatedly rebooting the other OSDs finally
> > cleared all the stuck OSDs, some of which hadn't resolved in over 2
> > days.
> > 
> > So at the moment, it's either rebalancing the cleaned-up OSDs or in
> > a
> > loop thinking that it is. 
> 
> Since you deleted some pools, it's probably the upmap balancer
> rebalancing PGs across the OSDs.
> 
> > And the PG/per-OSD count seems way too high,
> 
> How much is it right now? With what hardware?
> 
> > but the auto-sized doesn't seem to want to do anything about that.
> 
> If the PG autoscaler is enabled you could try adjusting per pool
> settings [1] and see if the # of PGs decreases.
> If disabled you could manually reduce the number of PGs on the
> remaining pools to lower the PG/OSD ratio.
> 
> Regards,
> Frédéric.
> 
> > 
> > Of course, the whole shebang has been unavailable to clients this
> > whole
> > week because of that.
> > 
> > I've been considering upgrading to reef, but recent posts regarding
> > issues resembling what I've been going through are making me pause.
> > 
> >  Again, thanks!
> >    Tim
> > 
> > On Wed, 2025-02-26 at 13:57 +0100, Frédéric Nass wrote:
> > > Hi Tim,
> > > 
> > > If you can't bring the host back online so that cephadm can
> > > remove
> > > these services itself, I guess you'll have to clean up the mess
> > > by:
> > > 
> > > - removing these services from the cluster (for example with a
> > > 'ceph
> > > mon remove {mon-id}' for the monitor)
> > > - forcing their removal from the orchestrator with the --force
> > > option
> > > on the commands 'ceph orch daemon rm <names>' and 'ceph orch host
> > > rm
> > > <hostname>'. If the --force option doesn't help, then looking
> > > into/editing/removing ceph-config keys like
> > > 'mgr/cephadm/inventory'
> > > and 'mgr/cephadm/host.ceph07.internal.mousetech.com' that 'ceph
> > > config-key dump' output shows might help.
> > > 
> > > Regards,
> > > Frédéric.
> > > 
> > > ----- Le 25 Fév 25, à 16:42, Tim Holloway timh@xxxxxxxxxxxxx a
> > > écrit
> > > :
> > > 
> > > > Ack. Another fine mess.
> > > > 
> > > > I was trying to clean things up and the process of tossing
> > > > around
> > > > OSD's
> > > > kept getting me reports of slow responses and hanging PG
> > > > operations.
> > > > 
> > > > This is Ceph Pacific, by the way.
> > > > 
> > > > I found a deprecated server that claimed to have an OSD even
> > > > though
> > > > it
> > > > didn't show in either "ceph osd tree" or the dashboard OSD
> > > > list. I
> > > > suspect that a lot of the grief came from it attempting to use
> > > > resources that weren't always seen as resources.
> > > > 
> > > > I shut down the server's OSD (removed the daemon using ceph
> > > > orch),
> > > > then
> > > > foolishly deleted the server from the inventory without doing a
> > > > drain
> > > > first.
> > > > 
> > > > Now cephadmin hates me (key not found), and there are still an
> > > > MDS
> > > > and
> > > > MON listed as ceph orch ls daemons even after I powered the
> > > > host
> > > > off.
> > > > 
> > > > I cannot do a ceph orch daemon delete because there's no longer
> > > > an
> > > > IP
> > > > address available to the daemon delete, and I cannot clear the
> > > > cephadmin queue:
> > > > 
> > > > [ERR] MGR_MODULE_ERROR: Module 'cephadm' has failed:
> > > > 'ceph07.internal.mousetech.com'
> > > > 
> > > > Any suggestions?
> > > > _______________________________________________
> > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx