Re: [External Email] Re: Natuilus: Taking out OSDs that are 'Failure Pending' [EXT]

Tyler Stachecki <stachecki.tyler@xxxxxxxxx> · Fri, 4 Aug 2023 13:40:18 -0400

On Fri, Aug 4, 2023 at 11:33 AM Dave Hall <kdhall@xxxxxxxxxxxxxx> wrote:
>
> Dave,
>
> Actually, my failure domain is OSD since I so far only have 9 OSD nodes but
> EC 8+2.  However, the drives are still functioning, except that one has
> failed multiple times in the last few days, requiring a node power-cycle to
> recover.  I will certainly mark that one out immediately.
>
> The other two pending failures are behaving more politely, so I am assuming
> that the cluster could copy the data elsewhere as part of the rebalance.  I
> think I'm also concerned about the rebalance process moving data to these
> drives with pending failures.
>
> Since I'm EC 8+2, perhaps it is safe to mark two out simultaneously?

Dave,

You should be able to mark out two OSDs simultaneously without worry
as long as you have enough space, etc. When you mark an OSD out, it
still participates in the cluster as long as the OSD remains up and is
able to aid in the backfilling process. Thus, you'll also want to
avoid stopping/downing the OSDs until backfilling completes. Following
that logic: if you stop both OSDs before backfilling completes, you
will put yourself in a bad spot.

If all PGs are active+clean, you may both a) out the two OSDs and b)
stop/down *only the one* imminently failing OSD (leaving the second
OSD being drained still up) and things should also be fine... but you
will be vulnerable to blocked ops/unavailable data if _subsequent_
OSDs fail unexpectedly, including the second OSD being out'd,
depending upon your CRUSH map and cluster status.

Note that if your intent is to purge the OSD after it is drained, I
believe you should do a `ceph osd crush reweight osd.X 0` and not an
`ceph out osd.X` or `ceph osd reweight osd.X 0` as it should result in
slightly less net data movement.

Cheers,
Tyler
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx