Re: ceph osd down doesn't seem to work

Simon Oosthoek <s.oosthoek@xxxxxxxxxxxxx> · Tue, 3 Oct 2023 20:45:37 +0200

Hoi Josh,

thanks for the explanation, I want to mark it out, not down :-)

Most use of our cluster is in EC 8+3 or 5+4 pools, so one missing osd 
isn't bad, but if some of the blocks can still be read it may help to 
move them to safety. (This is how I imagine things anyway ;-)

I'll have to look into the manually correcting of those inconsistent PGs 
if they don't recover by ceph-magic alone...

Cheers

/Simon

On 03/10/2023 18:21, Josh Baergen wrote:
Hi Simon,

If the OSD is actually up, using 'ceph osd down` will cause it to flap
but come back immediately. To prevent this, you would want to 'ceph
osd set noup'. However, I don't think this is what you actually want:

I'm thinking (but perhaps incorrectly?) that it would be good to keep the OSD down+in, to try to read from it as long as possible

In this case, you actually want it up+out ('ceph osd out XXX'), though
if it's replicated then marking it out will switch primaries around so
that it's not actually read from anymore. It doesn't look like you
have that much recovery backfill left, so hopefully you'll be in a
clean state soon, though you'll have to deal with those 'inconsistent'
and 'recovery_unfound' PGs.

Josh

On Tue, Oct 3, 2023 at 10:14 AM Simon Oosthoek <s.oosthoek@xxxxxxxxxxxxx> wrote:

Hi

I'm trying to mark one OSD as down, so we can clean it out and replace
it. It keeps getting medium read errors, so it's bound to fail sooner
rather than later. When I command ceph from the mon to mark the osd
down, it doesn't actually do it. When the service on the osd stops, it
is also marked out and I'm thinking (but perhaps incorrectly?) that it
would be good to keep the OSD down+in, to try to read from it as long as
possible. Why doesn't it get marked down and stay that way when I
command it?

Context: Our cluster is in a bit of a less optimal state (see below),
this is after one of OSD nodes had failed and took a week to get back up
(long story). Due to a seriously unbalanced filling of our OSDs we kept
having to reweight OSDs to keep below the 85% threshold. Several disks
are starting to fail now (they're 4+ years old and failures are expected
to occur more frequently).

I'm open to suggestions to help get us back to health_ok more quickly,
but I think we'll get there eventually anyway...

Cheers

/Simon

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx