Re: [External Email] Re: Recreate Destroyed OSD

Dave Hall <kdhall@xxxxxxxxxxxxxx> · Fri, 1 Nov 2024 14:28:16 -0400

Tim,

Actually, the links the Eugen shared earlier were sufficient.  I ended up
with

service_type: osd
service_name: osd
placement:
  host_pattern: 'ceph01'
spec:
  data_devices:
    rotational: 1
  db_devices:
    rotational: 0

This worked exactly right as far as creating the OSD - it found and reused
the same OSD number that was previously destroyed, and also recreated the
WAL/DB LV using the 'blank spot' on the NVMe drive.

However, I'm a bit concerned that the output of 'ceph orch ls osd' has
changed in a way that might not be quite good:

NAME  PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
osd               32  3m ago     52m  ceph01

Before all of this started this line used to contain the word 'unmanaged'
somewhere.  Eugen and I were having a side discussion about how to make all
of my OSDs managed without destroying them, so I could do things like 'ceph
orch restart osd' to restart all of the OSDs to assure that the pick up
changes to attributes like osd_memory_target and osd_memory_target_autotune,

So, in applying this spec, did I make all my OSDs managed, or just all of
the ones on ceph01, or just the one that got created when I applied the
spec?

When I add my next host, should I change the placement to that host name or
to '*'?

More generally, is there a higher level document that talks about Ceph spec
files and the orchestrator - something that deals with the general concepts?

Thanks.

-Dave

--
Dave Hall
Binghamton University
kdhall@xxxxxxxxxxxxxx

On Fri, Nov 1, 2024 at 1:40 PM Tim Holloway <timh@xxxxxxxxxxxxx> wrote:

> I can't offer a spec off the cuff, but if the LV still exists and you
> don't need to change its size, then I'd zap it to remove residual Ceph
> info because otherwise the operation will complain and fail.
>
> Having done that, the requirements should be the same as a first-time
> construction of an OSD on that LV. Eugen can likely give you the spec
> info. I'd have to RTFM.
>
>     Tim
>
>
> On 11/1/24 11:22, Dave Hall wrote:
> > Tim, Eugen,
> >
> > So what would a spec file look like for a single OSD that uses a specific
> > HDD (/dev/sdi) and with WAL/DB on an LV that's 25% of a specific NVMe
> > drive?  Regarding the NVMe, there are 3 other OSDs already using 25% each
> > of this NVMe for WAL/DB, but I have removed the LV that was used by the
> > failed OSD.  Do I need to pre-create the LV, or will 'ceph orch' do that
> > for me?
> >
> > Thanks.
> >
> > -Dave
> >
> > --
> > Dave Hall
> > Binghamton University
> > kdhall@xxxxxxxxxxxxxx
> >
> > On Thu, Oct 31, 2024 at 3:52 PM Tim Holloway <timh@xxxxxxxxxxxxx> wrote:
> >
> >> I migrated from gluster when I found out it's going unsupported shortly.
> >> I'm really not big enough for Ceph proper, but there were only so many
> >> supported distributed filesystems with triple redundancy.
> >>
> >> Where I got into trouble was that I started off with Octopus and Octopus
> >> had some teething pains. Like stalling scheduled operations until the
> >> system was clan but the only way to get a clean system was to run the
> >> stalled operations. Pacific cured that for me.
> >>
> >> But the docs were and remain somewhat fractured between legacy and
> >> managed services and I managed to get into a real mess there, especially
> >> since I was wildly trying anything to get those stalled fixes to take.
> >>
> >> Since then, I've pretty much redefined all my OSDs with fewer but larger
> >> datastores and made them all managed. Now if I could just persuade the
> >> auto-tuner to fix the PG sizes,
> >>
> >> I'm in the process of opening a ticket account right now. The fun part
> >> of this is that realistically, older docs need a re-write just as much
> >> as the docs for the current release.
> >>
> >>      Tim
> >>
> >> On 10/31/24 15:39, Eugen Block wrote:
> >>> I completely understand your point of view. Our own main cluster is
> >>> also a bit "wild" in its OSD layout, that's why its OSDs are
> >>> "unmanaged" as well. When we adopted it via cephadm, I started to
> >>> create suitable osd specs for all those hosts and OSDs and I gave up.
> >>> :-D But since we sometimes also tend to experiment a bit, I rather
> >>> have full control over it. That's why we also have
> >>> osd_crush_initial_weight = 0, to check the OSD creation before letting
> >>> Ceph remap any PGs.
> >>>
> >>> It definitely couldn't hurt to clarify the docs, you can always report
> >>> on tracker.ceph.com if you have any improvement ideas.
> >>>
> >>> Zitat von Tim Holloway <timh@xxxxxxxxxxxxx>:
> >>>
> >>>> I have been slowly migrating towards spec files as I prefer
> >>>> declarative management as a rule.
> >>>>
> >>>> However, I think that we may have a dichotomy in the user base.
> >>>>
> >>>> On the one hand, users with dozens/hundreds of server/drives of
> >>>> basically identical character.
> >>>>
> >>>> On the other, I'm one who's running fewer servers and for historical
> >>>> reasons they tend to be wildly individualistic and often have blocks
> >>>> of future-use space reserved for non-ceph storage.
> >>>>
> >>>> Ceph, left to its own devices (no pun intended) can be quite
> >>>> enthusiastic about adopting any storage it can find. And that's great
> >>>> for users in the first category. Which is what the spec information
> >>>> in the supplied links is emphasizing. But for us lesser creatures who
> >>>> feel the need to manually control where each OSD and how it's
> >>>> configured, it's not so simple. I'm fairly certain that there's
> >>>> documentation on the spec file setup for that sort of stuff in the
> >>>> online docs, but it's located somewhere else and I cannot recall
> where.
> >>>>
> >>>> At any rate I would consider it very important that the different
> >>>> ways to setup an OSD should explicitly indicate which type of OSD
> >>>> will be generated in their documentation.
> >>>>
> >>>>     Tim
> >>>>
> >>>>
> >>>> On 10/31/24 14:28, Eugen Block wrote:
> >>>>> Hi,
> >>>>>
> >>>>> the preferred method to deploy OSDs in cephadm managed clusters are
> >>>>> spec files, see this part of the docs [0] for more information. I
> >>>>> would just not use the '--all-available-devices' flag, except in
> >>>>> test clusters, or if you're really sure that this is what you want.
> >>>>>
> >>>>> If you use 'ceph orch daemon add osd ...', you'll end up with one
> >>>>> (or more) OSD(s), but they will be unmanaged, as you already noted
> >>>>> in your own cluster. There are a couple of examples with advanced
> >>>>> specs (e. g. DB/WAL on dedicated devices) in the docs as well [1].
> >>>>> So my recommendation would be to have a suiting spec file for your
> >>>>> disk layout. You can always check with the '--dry-run' flag before
> >>>>> actually applying it:
> >>>>>
> >>>>> ceph orch apply -i osd-spec.yaml --dry-run
> >>>>>
> >>>>> Regards,
> >>>>> Eugen
> >>>>>
> >>>>> [0]
> https://docs.ceph.com/en/latest/cephadm/services/osd/#deploy-osds
> >>>>> [1]
> >>>>>
> >>
> https://docs.ceph.com/en/latest/cephadm/services/osd/#advanced-osd-service-specifications
> >>>>> Zitat von Tim Holloway <timh@xxxxxxxxxxxxx>:
> >>>>>
> >>>>>> As I understand it, the manual OSD setup is only for legacy
> >>>>>> (non-container) OSDs. Directory locations are wrong for managed
> >>>>>> (containerized) OSDs, for one.
> >>>>>>
> >>>>>> Actually, the whole manual setup docs ought to be moved out of the
> >>>>>> mainline documentation. In their present arrangement, they make
> >>>>>> legacy setup sound like the preferred method. And have you noticed
> >>>>>> that there is no corresponding well-marked section titled
> >>>>>> "Authomated (cephadmin) setup?".
> >>>>>>
> >>>>>> This is how we end up with OSDs that are simultaneously legacy AND
> >>>>>> administered for the same OSD, since at last count there are no
> >>>>>> interlocks within Ceph to prevent such a mess.
> >>>>>>
> >>>>>>     Tim
> >>>>>>
> >>>>>> On 10/31/24 13:39, Dave Hall wrote:
> >>>>>>> Hello.
> >>>>>>>
> >>>>>>> Sorry if it appears that I am reposting the same issue under a
> >>>>>>> different
> >>>>>>> topic.  However, I feel that the problem has moved and I now have
> >>>>>>> different
> >>>>>>> questions.
> >>>>>>>
> >>>>>>> At this point I have, I believe, removed all traces of OSD.12 from
> my
> >>>>>>> cluster - based on steps in the Reef docs at
> >>>>>>> https://docs.ceph.com/en/reef/rados/operations/add-or-rm-osds/#.
> >>>>>>> I have
> >>>>>>> further located and removed the WAL/DB LV on an associated NVMe
> drive
> >>>>>>> (shared with 3 other OSDs).
> >>>>>>>
> >>>>>>> I don't believe the instructions for replacing an OSD (ceph-volume
> >>>>>>> lvm
> >>>>>>> prepare) still apply, so I have been trying to work with the
> >>>>>>> instructions
> >>>>>>> under ADDING AN OSD (MANUAL).
> >>>>>>>
> >>>>>>> However, since my installation is containerized (Podman), it is
> >>>>>>> unclear
> >>>>>>> which steps should be issued on the host and which within 'cephadm
> >>>>>>> shell'.
> >>>>>>>
> >>>>>>> There is also another ambiguity:  In step 3 the instruction is to
> >>>>>>> 'mkfs -t
> >>>>>>> {fstype}' and then to 'mount -o user_xattr'.  However, which fs
> type?
> >>>>>>>
> >>>>>>> After this, in step 4, the 'ceph-osd -i {osd-id} --mkfs --mkkey'
> gets
> >>>>>>> throws errors about the keyring file.
> >>>>>>>
> >>>>>>> So, are these the right instructions to be using in a containerized
> >>>>>>> installation?  Are there, in general, alternate documents for
> >>>>>>> containerized
> >>>>>>> installations?
> >>>>>>>
> >>>>>>> Lastly, the above cited instructions don't say anything about the
> >>>>>>> separate
> >>>>>>> WAL/DB LV.
> >>>>>>>
> >>>>>>> Please advise.
> >>>>>>>
> >>>>>>> Thanks.
> >>>>>>>
> >>>>>>> -Dave
> >>>>>>>
> >>>>>>> --
> >>>>>>> Dave Hall
> >>>>>>> Binghamton University
> >>>>>>> kdhall@xxxxxxxxxxxxxx
> >>>>>>> _______________________________________________
> >>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>>>> _______________________________________________
> >>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>>>
> >>>>> _______________________________________________
> >>>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>> _______________________________________________
> >>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx