Re: Correct Usage of the ceph-objectstore-tool??

"Alexander E. Patrakov" <patrakov@xxxxxxxxx> · Fri, 7 Jan 2022 08:08:20 +0500

пт, 7 янв. 2022 г. в 06:21, Lee <lquince@xxxxxxxxx>:

> Hello,
>
> As per another post I been having a huge issue since a PGNUM increase took
> my cluster offline..
>
> I have got to a point where I have just 20 PG's Down / Unavailable due to
> not being able to start a OSD
>
> I have been able to export the PG from the the offline OSD
>
> I then import to a clean / new OSD which is set to weight 0 in Crush using
> the following command.
>
> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-17 --no-mon-config
> --op import --file /mnt/croit/backup/pg10.0
>

If you did this with the OSD being offline, and then started it, then you
did everything correctly. OTOH my preferred approach would be not to set
the weight to 0, but to create a separate otherwise-unused CRUSH pool and
assign the OSDs with extra data to it, but your approach is also valid.

> When I start the OSD I see in the log loads of stuff being transitioned to
> Stray.
>

This is an indicator that you did everything right. Stray means an extra
copy of data in a place where it is not supposed to be - but that's exactly
what you did and what you were supposed to do!

> Do I need to tell CEPH to used the pg on the OSD to rebuild? When I query
> the PG at the end it complains about marking the offline OSD as offline?
>

We need to understand why this happens. The usual scenario where the
automatic rebuild doesn't start is when some of the PGs that you exported
and imported do not represent the known latest copy of the data. Maybe
there is another copy on another dead OSD, try exporting and importing it,
too. Basically, you want to inject all copies of all PGs that are
unhealthy. A copy of "ceph pg dump" output (as an attached file, not
inline) might be helpful. Also, run "ceph pg 1.456 query" where 1.456 is
the PG ID that you have imported - for a few problematic PGs.

> I have looked online and cannot find a definitive guide on how the process
> / steps that should be taken.
>

There is no single definitive guide. My approach would be to treat the
broken OSDs as broken for good, but without using any command that includes
the word "lost" (because this actually loses data). You can mark the dead
OSDs as "out", if Ceph didn't do it for you already. Then add enough
capacity with non-zero weight in the correct pool (or maybe do nothing if
you already have enough space). Ceph will rebalance the data automatically
when it obtains a proof that it really has the latest data.

When I encountered this "dead osd" issue last time, I found it useful to
compare the "ceph health detail" output over time, and with/without the
OSDs with injected PGs running. At the very least, it provides a useful
metric of what is remaining to do.

Also an interesting read-only command (but maybe for later) would be: "ceph
osd safe-to-destroy 123" where 123 is the dead OSD id.

-- 
Alexander E. Patrakov
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx