Hey Frank, in regards to destroying a cluster, I'd suggest to reuse the old --yes-i-really-mean-it parameter, as it is already in use by ceph osd destroy [0]. Then it doesn't matter whether it's prod or not, if you really mean it ... ;-) Best regards, Nico [0] https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/ Frank Schilder <frans@xxxxxx> writes: > Hi, I would like to second Nico's comment. What happened to the idea that a deployment tool should be idempotent? The most natural option would be: > > 1) start install -> something fails > 2) fix problem > 3) repeat exact same deploy command -> deployment picks up at current state (including cleaning up failed state markers) and tries to continue until next issue (go to 2) > > I'm not sure (meaning: its a terrible idea) if its a good idea to > provide a single command to wipe a cluster. Just for the fat finger > syndrome. This seems safe only if it would be possible to mark a > cluster as production somehow (must be sticky, that is, cannot be > unset), which prevents a cluster destroy command (or any too dangerous > command) from executing. I understand the test case in the tracker, > but having such test-case utils that can run on a production cluster > and destroy everything seems a bit dangerous. > > I think destroying a cluster should be a manual and tedious process > and figuring out how to do it should be part of the learning > experience. So my answer to "how do I start over" would be "go figure > it out, its an important lesson". > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Nico Schottelius <nico.schottelius@xxxxxxxxxxx> > Sent: Friday, May 26, 2023 10:40 PM > To: Redouane Kachach > Cc: ceph-users@xxxxxxx > Subject: Re: Seeking feedback on Improving cephadm bootstrap process > > > Hello Redouane, > > much appreciated kick-off for improving cephadm. I was wondering why > cephadm does not use a similar approach to rook in the sense of "repeat > until it is fixed?" > > For the background, rook uses a controller that checks the state of the > cluster, the state of monitors, whether there are disks to be added, > etc. It periodically restarts the checks and when needed shifts > monitors, creates OSDs, etc. > > My question is, why not have a daemon or checker subcommand of cephadm > that a) checks what the current cluster status is (i.e. cephadm > verify-cluster) and b) fixes the situation (i.e. cephadm verify-and-fix-cluster)? > > I think that option would be much more beneficial than the other two > suggested ones. > > Best regards, > > Nico -- Sustainable and modern Infrastructures by ungleich.ch _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx