Re: How can I clone data from a faulty bluestore disk?

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Sat, 3 Feb 2024 13:17:12 -0500

I’ve done the pg import dance a couple of times.  It was very slow but did work ultimately.  
Depending on the situation, if there is one valid copy available one can enable recovery by temporarily setting min_size on the pool to 1, reverting it once recovery completes.  

You you run with 1 all the time, that can lead to this situation. 

> On Feb 3, 2024, at 11:39 AM, Alexander E. Patrakov <patrakov@xxxxxxxxx> wrote:
> 
> Hi,
> 
> I think that the approach with exporting and importing PGs would be
> a-priori more successful than the one based on pvmove or ddrescue. The
> reason is that you don't need to export/import all data that the
> failed disk holds, but only the PGs that Ceph cannot recover
> otherwise. The logic here is that these are, likely, not the same PGs
> due to which tools are crashing.
> 
> Note that after the export/import operation Ceph might still think "I
> need a copy from that failed disk and not the one that you gave me",
> in this case just export a copy of the same PG from the other failed
> OSD and import elsewhere, up to the total number of copies. If even
> that desn't help, "ceph osd lost XX" would be the last (very
> dangerous) words to convince Cepth that osd.XX will not be seen in the
> future.
> 
>> On Sat, Feb 3, 2024 at 5:35 AM Eugen Block <eblock@xxxxxx> wrote:
>> 
>> Hi,
>> 
>> if the OSDs are deployed as LVs (by ceph-volume) you could try to do a
>> pvmove to a healthy disk. There was a thread here a couple of weeks
>> ago explaining the steps. I don’t have it at hand right now, but it
>> should be easy to find.
>> Of course, there’s no guarantee that this will be successful. I also
>> can’t tell if Igor‘s approach is more promising.
>> 
>> Zitat von Igor Fedotov <igor.fedotov@xxxxxxxx>:
>> 
>>> Hi Carl,
>>> 
>>> you might want to use ceph-objectstore-tool to export PGs from
>>> faulty OSDs and import them back to healthy ones.
>>> 
>>> The process could be quite tricky though.
>>> 
>>> There is also pending PR (https://github.com/ceph/ceph/pull/54991)
>>> to make the tool more tolerant to disk errors.
>>> 
>>> The patch worth trying in some cases, not a silver bullet though.
>>> 
>>> And generally whether the recovery doable greatly depends on the
>>> actual error(s).
>>> 
>>> 
>>> Thanks,
>>> 
>>> Igor
>>> 
>>> On 02/02/2024 19:03, Carl J Taylor wrote:
>>>> Hi,
>>>> I have a small cluster with some faulty disks within it and I want to clone
>>>> the data from the faulty disks onto new ones.
>>>> 
>>>> The cluster is currently down and I am unable to do things like
>>>> ceph-bluestore-fsck but ceph-bluestore-tool  bluefs-export does appear to
>>>> be working.
>>>> 
>>>> Any help would be appreciated
>>>> 
>>>> Many thanks
>>>> Carl
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 
> 
> 
> --
> Alexander E. Patrakov
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx