Re: Seeking feedback on Improving cephadm bootstrap process

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello all,

Thank you very much for your valuable feedback. I'd like to provide some
context and clarify certain points as there seems to be some confusion
regarding the objective of this discussion and how a cephadm initial
bootstrap works.

As you know, Ceph has the capability to run multiple clusters on the same
nodes, with certain limitations that I won't delve into in this discussion.
Each cluster has its own unique identifier (called fsid), which in fact is
a UUID generated by cephadm during cluster bootstrap or provided by the
user. Almost all cluster-related files, including cluster and daemon
configurations, systemd units, logs, etc., are specific to each cluster and
are stored in dedicated directories based on the fsid, such as
/var/lib/ceph/<fsid>, /var/log/ceph/<fsid>, /run/ceph/<fsid>, and so on.
These directories ensure isolation between cluster files and daemons,
preventing any file or configuration sharing between clusters. Typically,
as a user, you need not concern yourself with the exact location of the
cluster files when deleting a cluster. For this purpose, cephadm provides a
dedicated command, "cephadm rm-cluster," (
https://docs.ceph.com/en/latest/cephadm/operations/#purging-a-cluster)
which handles the deletion of cluster files, removal of daemons, and so
forth. Importantly, this command uses the fsid to ensure the command's
safety in environments where multiple clusters coexist.

That being clarified, I want to emphasize that this discussion does not
revolve around the workings or options provided by the "cephadm rm-cluster"
command. This command is the official method for deleting a cluster and is
employed in both upstream and production clusters. In case you have
suggestions for improving the user experience with this command we can
start a separate thread for that purpose.

Back to the original subject:

During the process of bootstrapping a new cluster with cephadm, in addition
to installing files in their respective locations, core ceph daemons such
as mgr and mon are started. If the bootstrap process succeeds we end up
with a minimal ceph cluster consisting only of the necessary files and
daemons. In case of bootstrap failure, a minimal, broken, non-functional
ceph cluster is created with no actual data (no OSDs), and potentially with
some daemons (mgr/mon) running on the current node. Retaining these files
and daemons provides no real benefit to the user apart from facilitating
the investigation of bugs or issues that may prevent the bootstrap process.
Even in such cases, once the investigation is complete and the issue is
resolved, the user must delete this cluster since it is useless and may
have active daemons listening on mon/mgr sockets, thereby obstructing the
creation of future clusters on the same node due to occupied mon/mgr ports.

The purpose of this email thread is to discuss how to address this
situation. Given that we have full control over the bootstrap process, we
can automatically clean up this broken cluster (or at least assist the user
in doing so). The proposed rollback options are: either an automatic
cleanup (option 2) or a manual cleanup (option 1) as mentioned in the
original email. The goal of this thread is to get some feedback about your
preference as a user and gather input on the additional information you
would like to receive regarding each option.

Side Note:

As a response to the question why we don't use some mechanism like Rook
does. The answer is cephadm is a "binary" meant for bare-metal deployments.
Unlike Rook, which operates within the framework of a higher-level
orchestration system like k8s or Openshift, in the case of cephadm we have
no
daemon nor any other high level controller that can watch and fix a broken
installation. cephadm is the only binary needed (+ some minimal
dependencies)
to bootstrap a new ceph cluster.

Best Regards,
Redouane.


On Tue, May 30, 2023 at 10:30 AM Frank Schilder <frans@xxxxxx> wrote:

> Hi, I would like to second Nico's comment. What happened to the idea that
> a deployment tool should be idempotent? The most natural option would be:
>
> 1) start install -> something fails
> 2) fix problem
> 3) repeat exact same deploy command -> deployment picks up at current
> state (including cleaning up failed state markers) and tries to continue
> until next issue (go to 2)
>
> I'm not sure (meaning: its a terrible idea) if its a good idea to provide
> a single command to wipe a cluster. Just for the fat finger syndrome. This
> seems safe only if it would be possible to mark a cluster as production
> somehow (must be sticky, that is, cannot be unset), which prevents a
> cluster destroy command (or any too dangerous command) from executing. I
> understand the test case in the tracker, but having such test-case utils
> that can run on a production cluster and destroy everything seems a bit
> dangerous.
>
> I think destroying a cluster should be a manual and tedious process and
> figuring out how to do it should be part of the learning experience. So my
> answer to "how do I start over" would be "go figure it out, its an
> important lesson".
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Nico Schottelius <nico.schottelius@xxxxxxxxxxx>
> Sent: Friday, May 26, 2023 10:40 PM
> To: Redouane Kachach
> Cc: ceph-users@xxxxxxx
> Subject:  Re: Seeking feedback on Improving cephadm bootstrap
> process
>
>
> Hello Redouane,
>
> much appreciated kick-off for improving cephadm. I was wondering why
> cephadm does not use a similar approach to rook in the sense of "repeat
> until it is fixed?"
>
> For the background, rook uses a controller that checks the state of the
> cluster, the state of monitors, whether there are disks to be added,
> etc. It periodically restarts the checks and when needed shifts
> monitors, creates OSDs, etc.
>
> My question is, why not have a daemon or checker subcommand of cephadm
> that a) checks what the current cluster status is (i.e. cephadm
> verify-cluster) and b) fixes the situation (i.e. cephadm
> verify-and-fix-cluster)?
>
> I think that option would be much more beneficial than the other two
> suggested ones.
>
> Best regards,
>
> Nico
>
>
> --
> Sustainable and modern Infrastructures by ungleich.ch
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux