Re: [External Email] Re: Recreate Destroyed OSD

Tim Holloway <timh@xxxxxxxxxxxxx> · Fri, 01 Nov 2024 14:49:50 -0400

I'm afraid I've never paid much attention to "ceprh orch ls osd".
Mostly I look at ceph orch ps or ceph osd tree and I nenver noticed
whether they had a specific indicator for managed/unmanaged.

The spec file is bog-standard YAML (Yet Another Markup Language) and
the actual valid elements for a given spec are defined by its consumer.
There's very good documentation on the spec info for OSDs.

I think that Ceph can actually run off a master spec file with all of
the options for all types of resources in it, but I cannot be certain.

"*" is simply a wildcard pattern meaning "all servers". If I had, for
example only certain hosts reserved for OSDs, say
'osd01.ceph.mousetech.com', then a pattern like
'osd*.ceph.mousetech.com' might work.

OSDs do not automatically convert. You have to issue an explicit
cephadm command to make a legacy OSD become an administered one. And it
will fail if you've done something like double-define an OSD.

   Tim

On Fri, 2024-11-01 at 14:28 -0400, Dave Hall wrote:
> Tim,
> 
> Actually, the links the Eugen shared earlier were sufficient.  I
> ended up
> with
> 
> service_type: osd
> service_name: osd
> placement:
>   host_pattern: 'ceph01'
> spec:
>   data_devices:
>     rotational: 1
>   db_devices:
>     rotational: 0
> 
> 
> This worked exactly right as far as creating the OSD - it found and
> reused
> the same OSD number that was previously destroyed, and also recreated
> the
> WAL/DB LV using the 'blank spot' on the NVMe drive.
> 
> However, I'm a bit concerned that the output of 'ceph orch ls osd'
> has
> changed in a way that might not be quite good:
> 
> NAME  PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
> osd               32  3m ago     52m  ceph01
> 
> 
> Before all of this started this line used to contain the word
> 'unmanaged'
> somewhere.  Eugen and I were having a side discussion about how to
> make all
> of my OSDs managed without destroying them, so I could do things like
> 'ceph
> orch restart osd' to restart all of the OSDs to assure that the pick
> up
> changes to attributes like osd_memory_target and
> osd_memory_target_autotune,
> 
> So, in applying this spec, did I make all my OSDs managed, or just
> all of
> the ones on ceph01, or just the one that got created when I applied
> the
> spec?
> 
> When I add my next host, should I change the placement to that host
> name or
> to '*'?
> 
> More generally, is there a higher level document that talks about
> Ceph spec
> files and the orchestrator - something that deals with the general
> concepts?
> 
> Thanks.
> 
> -Dave
> 
> --
> Dave Hall
> Binghamton University
> kdhall@xxxxxxxxxxxxxx
> 
> On Fri, Nov 1, 2024 at 1:40 PM Tim Holloway <timh@xxxxxxxxxxxxx>
> wrote:
> 
> > I can't offer a spec off the cuff, but if the LV still exists and
> > you
> > don't need to change its size, then I'd zap it to remove residual
> > Ceph
> > info because otherwise the operation will complain and fail.
> > 
> > Having done that, the requirements should be the same as a first-
> > time
> > construction of an OSD on that LV. Eugen can likely give you the
> > spec
> > info. I'd have to RTFM.
> > 
> >     Tim
> > 
> > 
> > On 11/1/24 11:22, Dave Hall wrote:
> > > Tim, Eugen,
> > > 
> > > So what would a spec file look like for a single OSD that uses a
> > > specific
> > > HDD (/dev/sdi) and with WAL/DB on an LV that's 25% of a specific
> > > NVMe
> > > drive?  Regarding the NVMe, there are 3 other OSDs already using
> > > 25% each
> > > of this NVMe for WAL/DB, but I have removed the LV that was used
> > > by the
> > > failed OSD.  Do I need to pre-create the LV, or will 'ceph orch'
> > > do that
> > > for me?
> > > 
> > > Thanks.
> > > 
> > > -Dave
> > > 
> > > --
> > > Dave Hall
> > > Binghamton University
> > > kdhall@xxxxxxxxxxxxxx
> > > 
> > > On Thu, Oct 31, 2024 at 3:52 PM Tim Holloway <timh@xxxxxxxxxxxxx>
> > > wrote:
> > > 
> > > > I migrated from gluster when I found out it's going unsupported
> > > > shortly.
> > > > I'm really not big enough for Ceph proper, but there were only
> > > > so many
> > > > supported distributed filesystems with triple redundancy.
> > > > 
> > > > Where I got into trouble was that I started off with Octopus
> > > > and Octopus
> > > > had some teething pains. Like stalling scheduled operations
> > > > until the
> > > > system was clan but the only way to get a clean system was to
> > > > run the
> > > > stalled operations. Pacific cured that for me.
> > > > 
> > > > But the docs were and remain somewhat fractured between legacy
> > > > and
> > > > managed services and I managed to get into a real mess there,
> > > > especially
> > > > since I was wildly trying anything to get those stalled fixes
> > > > to take.
> > > > 
> > > > Since then, I've pretty much redefined all my OSDs with fewer
> > > > but larger
> > > > datastores and made them all managed. Now if I could just
> > > > persuade the
> > > > auto-tuner to fix the PG sizes,
> > > > 
> > > > I'm in the process of opening a ticket account right now. The
> > > > fun part
> > > > of this is that realistically, older docs need a re-write just
> > > > as much
> > > > as the docs for the current release.
> > > > 
> > > >      Tim
> > > > 
> > > > On 10/31/24 15:39, Eugen Block wrote:
> > > > > I completely understand your point of view. Our own main
> > > > > cluster is
> > > > > also a bit "wild" in its OSD layout, that's why its OSDs are
> > > > > "unmanaged" as well. When we adopted it via cephadm, I
> > > > > started to
> > > > > create suitable osd specs for all those hosts and OSDs and I
> > > > > gave up.
> > > > > :-D But since we sometimes also tend to experiment a bit, I
> > > > > rather
> > > > > have full control over it. That's why we also have
> > > > > osd_crush_initial_weight = 0, to check the OSD creation
> > > > > before letting
> > > > > Ceph remap any PGs.
> > > > > 
> > > > > It definitely couldn't hurt to clarify the docs, you can
> > > > > always report
> > > > > on tracker.ceph.com if you have any improvement ideas.
> > > > > 
> > > > > Zitat von Tim Holloway <timh@xxxxxxxxxxxxx>:
> > > > > 
> > > > > > I have been slowly migrating towards spec files as I prefer
> > > > > > declarative management as a rule.
> > > > > > 
> > > > > > However, I think that we may have a dichotomy in the user
> > > > > > base.
> > > > > > 
> > > > > > On the one hand, users with dozens/hundreds of
> > > > > > server/drives of
> > > > > > basically identical character.
> > > > > > 
> > > > > > On the other, I'm one who's running fewer servers and for
> > > > > > historical
> > > > > > reasons they tend to be wildly individualistic and often
> > > > > > have blocks
> > > > > > of future-use space reserved for non-ceph storage.
> > > > > > 
> > > > > > Ceph, left to its own devices (no pun intended) can be
> > > > > > quite
> > > > > > enthusiastic about adopting any storage it can find. And
> > > > > > that's great
> > > > > > for users in the first category. Which is what the spec
> > > > > > information
> > > > > > in the supplied links is emphasizing. But for us lesser
> > > > > > creatures who
> > > > > > feel the need to manually control where each OSD and how
> > > > > > it's
> > > > > > configured, it's not so simple. I'm fairly certain that
> > > > > > there's
> > > > > > documentation on the spec file setup for that sort of stuff
> > > > > > in the
> > > > > > online docs, but it's located somewhere else and I cannot
> > > > > > recall
> > where.
> > > > > > 
> > > > > > At any rate I would consider it very important that the
> > > > > > different
> > > > > > ways to setup an OSD should explicitly indicate which type
> > > > > > of OSD
> > > > > > will be generated in their documentation.
> > > > > > 
> > > > > >     Tim
> > > > > > 
> > > > > > 
> > > > > > On 10/31/24 14:28, Eugen Block wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > the preferred method to deploy OSDs in cephadm managed
> > > > > > > clusters are
> > > > > > > spec files, see this part of the docs [0] for more
> > > > > > > information. I
> > > > > > > would just not use the '--all-available-devices' flag,
> > > > > > > except in
> > > > > > > test clusters, or if you're really sure that this is what
> > > > > > > you want.
> > > > > > > 
> > > > > > > If you use 'ceph orch daemon add osd ...', you'll end up
> > > > > > > with one
> > > > > > > (or more) OSD(s), but they will be unmanaged, as you
> > > > > > > already noted
> > > > > > > in your own cluster. There are a couple of examples with
> > > > > > > advanced
> > > > > > > specs (e. g. DB/WAL on dedicated devices) in the docs as
> > > > > > > well [1].
> > > > > > > So my recommendation would be to have a suiting spec file
> > > > > > > for your
> > > > > > > disk layout. You can always check with the '--dry-run'
> > > > > > > flag before
> > > > > > > actually applying it:
> > > > > > > 
> > > > > > > ceph orch apply -i osd-spec.yaml --dry-run
> > > > > > > 
> > > > > > > Regards,
> > > > > > > Eugen
> > > > > > > 
> > > > > > > [0]
> > https://docs.ceph.com/en/latest/cephadm/services/osd/#deploy-osds
> > > > > > > [1]
> > > > > > > 
> > > > 
> > https://docs.ceph.com/en/latest/cephadm/services/osd/#advanced-osd-service-specifications
> > > > > > > Zitat von Tim Holloway <timh@xxxxxxxxxxxxx>:
> > > > > > > 
> > > > > > > > As I understand it, the manual OSD setup is only for
> > > > > > > > legacy
> > > > > > > > (non-container) OSDs. Directory locations are wrong for
> > > > > > > > managed
> > > > > > > > (containerized) OSDs, for one.
> > > > > > > > 
> > > > > > > > Actually, the whole manual setup docs ought to be moved
> > > > > > > > out of the
> > > > > > > > mainline documentation. In their present arrangement,
> > > > > > > > they make
> > > > > > > > legacy setup sound like the preferred method. And have
> > > > > > > > you noticed
> > > > > > > > that there is no corresponding well-marked section
> > > > > > > > titled
> > > > > > > > "Authomated (cephadmin) setup?".
> > > > > > > > 
> > > > > > > > This is how we end up with OSDs that are simultaneously
> > > > > > > > legacy AND
> > > > > > > > administered for the same OSD, since at last count
> > > > > > > > there are no
> > > > > > > > interlocks within Ceph to prevent such a mess.
> > > > > > > > 
> > > > > > > >     Tim
> > > > > > > > 
> > > > > > > > On 10/31/24 13:39, Dave Hall wrote:
> > > > > > > > > Hello.
> > > > > > > > > 
> > > > > > > > > Sorry if it appears that I am reposting the same
> > > > > > > > > issue under a
> > > > > > > > > different
> > > > > > > > > topic.  However, I feel that the problem has moved
> > > > > > > > > and I now have
> > > > > > > > > different
> > > > > > > > > questions.
> > > > > > > > > 
> > > > > > > > > At this point I have, I believe, removed all traces
> > > > > > > > > of OSD.12 from
> > my
> > > > > > > > > cluster - based on steps in the Reef docs at
> > > > > > > > > https://docs.ceph.com/en/reef/rados/operations/add-or-rm-osds/#
> > > > > > > > > .
> > > > > > > > > I have
> > > > > > > > > further located and removed the WAL/DB LV on an
> > > > > > > > > associated NVMe
> > drive
> > > > > > > > > (shared with 3 other OSDs).
> > > > > > > > > 
> > > > > > > > > I don't believe the instructions for replacing an OSD
> > > > > > > > > (ceph-volume
> > > > > > > > > lvm
> > > > > > > > > prepare) still apply, so I have been trying to work
> > > > > > > > > with the
> > > > > > > > > instructions
> > > > > > > > > under ADDING AN OSD (MANUAL).
> > > > > > > > > 
> > > > > > > > > However, since my installation is containerized
> > > > > > > > > (Podman), it is
> > > > > > > > > unclear
> > > > > > > > > which steps should be issued on the host and which
> > > > > > > > > within 'cephadm
> > > > > > > > > shell'.
> > > > > > > > > 
> > > > > > > > > There is also another ambiguity:  In step 3 the
> > > > > > > > > instruction is to
> > > > > > > > > 'mkfs -t
> > > > > > > > > {fstype}' and then to 'mount -o user_xattr'. 
> > > > > > > > > However, which fs
> > type?
> > > > > > > > > 
> > > > > > > > > After this, in step 4, the 'ceph-osd -i {osd-id} --
> > > > > > > > > mkfs --mkkey'
> > gets
> > > > > > > > > throws errors about the keyring file.
> > > > > > > > > 
> > > > > > > > > So, are these the right instructions to be using in a
> > > > > > > > > containerized
> > > > > > > > > installation?  Are there, in general, alternate
> > > > > > > > > documents for
> > > > > > > > > containerized
> > > > > > > > > installations?
> > > > > > > > > 
> > > > > > > > > Lastly, the above cited instructions don't say
> > > > > > > > > anything about the
> > > > > > > > > separate
> > > > > > > > > WAL/DB LV.
> > > > > > > > > 
> > > > > > > > > Please advise.
> > > > > > > > > 
> > > > > > > > > Thanks.
> > > > > > > > > 
> > > > > > > > > -Dave
> > > > > > > > > 
> > > > > > > > > --
> > > > > > > > > Dave Hall
> > > > > > > > > Binghamton University
> > > > > > > > > kdhall@xxxxxxxxxxxxxx
> > > > > > > > > _______________________________________________
> > > > > > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > > > > > To unsubscribe send an email to
> > > > > > > > > ceph-users-leave@xxxxxxx
> > > > > > > > _______________________________________________
> > > > > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > > > > To unsubscribe send an email to
> > > > > > > > ceph-users-leave@xxxxxxx
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > > > > _______________________________________________
> > > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > > > 
> > > > > _______________________________________________
> > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > > _______________________________________________
> > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > > 
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx