Re: ceph octopus mysterious OSD crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



DOhhh...
Read David's proceedure. And was surprised.

I thought the wording of the --replace flag was so blindingly obvious, I didnt think I needed to read the docs. except apparently I do.


(ceph orch osd rm --replace)
"This follows the same procedure as the “Remove OSD” part with the exception that the OSD is not permanently removed from the CRUSH hierarchy, but is assigned a ‘destroyed’ flag."

Oh come on, guys...
This is a flag to an *orchestration* suite.
Dont you think that when the user/admin uses a --replace flag, the orchestration suite should.. ya know.. orchestrate a replacement?

otherwise, in my opinion, that flag really needs to be renamed to something else.





----- Original Message -----
From: "David Orman" <ormandj@xxxxxxxxxxxx>
To: "Eugen Block" <eblock@xxxxxx>
Cc: "Stefan Kooman" <stefan@xxxxxx>, "ceph-users" <ceph-users@xxxxxxx>, "Philip Brown" <pbrown@xxxxxxxxxx>
Sent: Thursday, March 25, 2021 12:04:17 PM
Subject: Re:  Re: ceph octopus mysterious OSD crash

As we wanted to verify this behavior with 15.2.10, we went ahead and
tested with a failed OSD. The drive was replaced, and we followed the
steps below (comments for clarity on our process) - this assumes you
have a service specification that will perform deployment once
matched:

# capture "db device" associated with OSD
ceph-volume list | less
# drain drive if possible, do this when planning replacement,
otherwise do once failure has occurred
ceph orch osd rm 391 --replace
# One drained (or if failure occurred), using "db device" path from
the ceph-volume list
lvremove /dev/ceph-blah/osd-db-blah
# monitor ceph for replacement
ceph -W cephadm
# once daemon has been deployed "TIMESTAMP mgr.cephXX.XXXXX [INF]
Deploying daemon osd.391 on cephXX", watch for rebalance to complete
ceph -s
--------------------
### consider increasing max_backfills if it's just a single drive replacement:
ceph config set osd osd_max_backfills 10
### if you do, after backfilling is complete:
ceph config rm osd osd_max_backfills

Following these steps, as soon as we completed the lvremove of the db
device in question, the OSD was rebuilt, and we verified a new
NVME-based db LV was created as per our specification:

service_type: osd
service_id: osd_spec_XXXXX
service_name: osd.osd_spec_XXXX
placement:
  host_pattern: '*'
spec:
  data_devices:
    rotational: 1
  db_devices:
    rotational: 0
  db_slots: 12
  filter_logic: AND
  objectstore: bluestore

Hope this helps out others in the future who need to deal with drive
replacements on cephadm/containerized deployments,
David

On Fri, Mar 19, 2021 at 4:57 PM David Orman <ormandj@xxxxxxxxxxxx> wrote:
>
> We also ran into a scenario in which I did exactly this, and it did
> _not_ work. It created the OSD, but did not put the DB/WAL on the NVME
> (didn't even create an LV). I'm wondering if there's some constraint
> applied (haven't looked at code yet) that when the NVME already has
> all but the one DB on it, it may not have the minimum space required
> (even though it's plenty based on the specification).
>
> Our service specification looks like this:
>
> service_type: osd
> service_id: osd_spec_test
> placement:
>   host_pattern: '*'
> data_devices:
>   rotational: 1
> db_devices:
>   rotational: 0
> db_slots: 12
>
> It works fine when fed an empty machine, but I've yet to get it to
> work when I've had an OSD fail, and I wipe out the LV for the DB and
> OSD. I'll get a new OSD, but no DB. On one of our clusters, due to the
> NVME sizing (800GB / 745.2G usable) + 24 OSDs the DBs (12 per NVME,
> two NVMEs per server) end up being ~62.1G, so there's about 62.1G free
> when we clear out the LV. I'm not sure why it doesn't 'do the right
> thing' and using that when spinning up the replaced OSD.
>
> I'm also curious what happens if two OSDs were to fail, you deleted
> two DBs, then added one OSD back. Would Ceph be smart enough to see
> the 12 slots per non-rotational in the osd specification and not
> allocate a 124.2G DB/WAL to that single OSD, preserving enough space
> for a second (for adding the second OSD later) - assuming this entire
> process worked as designed?
>
> David
>
> On Fri, Mar 19, 2021 at 4:20 PM Eugen Block <eblock@xxxxxx> wrote:
> >
> > I am quite sure that this case is covered by cephadm already. A few
> > months ago I tested it after a major rework of ceph-volume. I don’t
> > have any links right now. But I had a lab environment with multiple
> > OSDs per node with rocksDB on SSD and after wiping both HDD and DB LV
> > cephadm automatically redeployed the OSD according to my drive group
> > file.
> >
> >
> > Zitat von Stefan Kooman <stefan@xxxxxx>:
> >
> > > On 3/19/21 7:47 PM, Philip Brown wrote:
> > >
> > > I see.
> > >
> > >>
> > >> I dont think it works when 7/8 devices are already configured, and
> > >> the SSD is already mostly sliced.
> > >
> > > OK. If it is a test cluster you might just blow it all away. By
> > > doing this you are simulating a "SSD" failure taking down all HDDs
> > > with it. It sure isn't pretty. I would say the situation you ended
> > > up with is not a corner case by any means. I am afraid I would
> > > really need to set up a test cluster with cephadm to help you
> > > further at this point, besides the suggestion above.
> > >
> > > Gr. Stefan
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux