I’ve done the pg import dance a couple of times. It was very slow but did work ultimately. Depending on the situation, if there is one valid copy available one can enable recovery by temporarily setting min_size on the pool to 1, reverting it once recovery completes. You you run with 1 all the time, that can lead to this situation. > On Feb 3, 2024, at 11:39 AM, Alexander E. Patrakov <patrakov@xxxxxxxxx> wrote: > > Hi, > > I think that the approach with exporting and importing PGs would be > a-priori more successful than the one based on pvmove or ddrescue. The > reason is that you don't need to export/import all data that the > failed disk holds, but only the PGs that Ceph cannot recover > otherwise. The logic here is that these are, likely, not the same PGs > due to which tools are crashing. > > Note that after the export/import operation Ceph might still think "I > need a copy from that failed disk and not the one that you gave me", > in this case just export a copy of the same PG from the other failed > OSD and import elsewhere, up to the total number of copies. If even > that desn't help, "ceph osd lost XX" would be the last (very > dangerous) words to convince Cepth that osd.XX will not be seen in the > future. > >> On Sat, Feb 3, 2024 at 5:35 AM Eugen Block <eblock@xxxxxx> wrote: >> >> Hi, >> >> if the OSDs are deployed as LVs (by ceph-volume) you could try to do a >> pvmove to a healthy disk. There was a thread here a couple of weeks >> ago explaining the steps. I don’t have it at hand right now, but it >> should be easy to find. >> Of course, there’s no guarantee that this will be successful. I also >> can’t tell if Igor‘s approach is more promising. >> >> Zitat von Igor Fedotov <igor.fedotov@xxxxxxxx>: >> >>> Hi Carl, >>> >>> you might want to use ceph-objectstore-tool to export PGs from >>> faulty OSDs and import them back to healthy ones. >>> >>> The process could be quite tricky though. >>> >>> There is also pending PR (https://github.com/ceph/ceph/pull/54991) >>> to make the tool more tolerant to disk errors. >>> >>> The patch worth trying in some cases, not a silver bullet though. >>> >>> And generally whether the recovery doable greatly depends on the >>> actual error(s). >>> >>> >>> Thanks, >>> >>> Igor >>> >>> On 02/02/2024 19:03, Carl J Taylor wrote: >>>> Hi, >>>> I have a small cluster with some faulty disks within it and I want to clone >>>> the data from the faulty disks onto new ones. >>>> >>>> The cluster is currently down and I am unable to do things like >>>> ceph-bluestore-fsck but ceph-bluestore-tool bluefs-export does appear to >>>> be working. >>>> >>>> Any help would be appreciated >>>> >>>> Many thanks >>>> Carl >>>> _______________________________________________ >>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > -- > Alexander E. Patrakov > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx