Re: Correct Usage of the ceph-objectstore-tool??

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Cheers for the explanation..

I found the problem, Ceph had marked the new OSD as Out, as soon as I
marked it in, then it started to rebuild.

Thankyou for taking the time to explain the above.

Looks like i have everything back (fingers crossed).

On Fri, 7 Jan 2022 at 03:08, Alexander E. Patrakov <patrakov@xxxxxxxxx>
wrote:

> пт, 7 янв. 2022 г. в 06:21, Lee <lquince@xxxxxxxxx>:
>
>> Hello,
>>
>> As per another post I been having a huge issue since a PGNUM increase took
>> my cluster offline..
>>
>> I have got to a point where I have just 20 PG's Down / Unavailable due to
>> not being able to start a OSD
>>
>> I have been able to export the PG from the the offline OSD
>>
>> I then import to a clean / new OSD which is set to weight 0 in Crush using
>> the following command.
>>
>> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-17
>> --no-mon-config
>> --op import --file /mnt/croit/backup/pg10.0
>>
>
> If you did this with the OSD being offline, and then started it, then you
> did everything correctly. OTOH my preferred approach would be not to set
> the weight to 0, but to create a separate otherwise-unused CRUSH pool and
> assign the OSDs with extra data to it, but your approach is also valid.
>
>
>> When I start the OSD I see in the log loads of stuff being transitioned to
>> Stray.
>>
>
> This is an indicator that you did everything right. Stray means an extra
> copy of data in a place where it is not supposed to be - but that's exactly
> what you did and what you were supposed to do!
>
>
>> Do I need to tell CEPH to used the pg on the OSD to rebuild? When I query
>> the PG at the end it complains about marking the offline OSD as offline?
>>
>
> We need to understand why this happens. The usual scenario where the
> automatic rebuild doesn't start is when some of the PGs that you exported
> and imported do not represent the known latest copy of the data. Maybe
> there is another copy on another dead OSD, try exporting and importing it,
> too. Basically, you want to inject all copies of all PGs that are
> unhealthy. A copy of "ceph pg dump" output (as an attached file, not
> inline) might be helpful. Also, run "ceph pg 1.456 query" where 1.456 is
> the PG ID that you have imported - for a few problematic PGs.
>
>
>> I have looked online and cannot find a definitive guide on how the process
>> / steps that should be taken.
>>
>
> There is no single definitive guide. My approach would be to treat the
> broken OSDs as broken for good, but without using any command that includes
> the word "lost" (because this actually loses data). You can mark the dead
> OSDs as "out", if Ceph didn't do it for you already. Then add enough
> capacity with non-zero weight in the correct pool (or maybe do nothing if
> you already have enough space). Ceph will rebalance the data automatically
> when it obtains a proof that it really has the latest data.
>
> When I encountered this "dead osd" issue last time, I found it useful to
> compare the "ceph health detail" output over time, and with/without the
> OSDs with injected PGs running. At the very least, it provides a useful
> metric of what is remaining to do.
>
> Also an interesting read-only command (but maybe for later) would be:
> "ceph osd safe-to-destroy 123" where 123 is the dead OSD id.
>
> --
> Alexander E. Patrakov
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux