Re: Return value from cephadm host-maintenance?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


(adding the list back to the thread)

On Wednesday, March 27, 2024 12:54:34 PM EDT Daniel Brown wrote:
> John
> I got curious and was taking another quick look through the python script
> for cephadm.

That's always welcome. :-D

> This is probably too simple of a question to be asking — or maybe I should
> say, I’m not expecting that there’s a simple answer to what might seem like
> a simple question -
> Is there anything that notifies the cluster, or the other hosts in a
> cluster, when a host is going into maintenance mode that it is going into
> maintenance mode, or is cephadm just doing systemctl commands behind the
> scenes to stop and later restart the appropriate ceph containers locally on
> that host?
> Maybe a better way to say it would be - what is differentiating between
> maintenance mode and a host simply crashing or going offline?

I'll paraphrase Adam King, tech lead for cephadm here:

If one runs the command from cephadm binary directly, it will be disabling/
stopping the systemd target only. The intention is for users to use the `ceph 
orch host maintenance` ... commands.

When you use the orch command (quoting Adam here):
when we put something into maintenance mode we
1) disable and stop the systemd target for the daemons on the host
2) set the noout flag for all the OSDs on that host
3) internally to cephadm mark the host as having a status of "maintenance" 
which has some effects such as us not refreshing metadata on that host or 
attempting to place/remove daemons from there

The main difference from that to a host going offline is the noout flag for the 
OSDs, and that cephadm will not periodically try to check if the host is 
alive, as it would do for an offline host.

I believe the noout flag stops it from trying to migrate all the data on that 
OSDs to other OSDs as it shouldn't be necessary if they will be coming back


The `cephadm host-maintenance enter` is meant to be a component of the `ceph 
orch host maintenance` workflow. It still has a bug, the way it always exits 
with an error is wrong. But you may not want to use it directly.

Reference links:

> > On Mar 22, 2024, at 6:26 AM, Daniel Brown <>
> > wrote:
> > 
> > 
> > Looks like it got OK’ed. I’ll put in something today.
> > 
> > 
> > --
> > Dan Brown
> > 
> >> On Mar 21, 2024, at 13:44, John Mulligan <phlogistonjohn@xxxxxxxxxxxxx>
> >> wrote:>> 
> >> On Thursday, March 21, 2024 11:43:19 AM EDT Daniel Brown wrote:
> >>> Assuming I need admin approval to report this on tracker, how long does
> >>> it
> >>> take to get approved?? Signed up a couple days ago, but still seeing
> >>> “Your
> >>> account was created and is now pending administrator approval.”
> >> 
> >> That's unfortunate. I pinged about  your issue signing up on the ceph
> >> slack
> >> channel for infrastructure. Hopefully, that'll get somebody's attention.
> >> If
> >> you don't get access by tomorrow feel free to ping me again directly and
> >> then *I'll* file the issue for you instead of having you wait around
> >> more.

ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]

  Powered by Linux