Re: Incomplete PGs. Ceph Consultant Wanted

"David C." <david.casier@xxxxxxxx> · Mon, 17 Jun 2024 18:33:58 +0200

1 pg / 16 is missing, in the meta pool, it is already enough to have great
difficulty browsing the FS
Your difficulty is to locate important objects in the data pool.

Try, perhaps, to target the important objects by retrieving the
layout/parent attributes on the objects in the cephfs-replicated pool:

Example:
# rados -p cephfs-replicated ls |while read inode_chunk; do ...
rados -p cephfs-replicated getxattr <inode> layout
rados -p cephfs-replicated getxattr <inode> parent

Le lun. 17 juin 2024 à 17:59, cellosofia1@xxxxxxxxx <cellosofia1@xxxxxxxxx>
a écrit :

> Command for trying the export was:
>
> [rook@rook-ceph-tools-recovery-77495958d9-plfch ~]$ rados export -p
> cephfs-replicated /mnt/recovery/backup-rados-cephfs-replicated
>
> We made sure we had enough space for this operation, and mounted the
> /mnt/recovery path using hostPath in the modified rook "toolbox" deployment.
>
> Regards
>
> On Mon, Jun 17, 2024 at 11:56 AM cellosofia1@xxxxxxxxx <
> cellosofia1@xxxxxxxxx> wrote:
>
>> Hi,
>>
>> I understand,
>>
>> We had to re-create the OSDs because of backing storage hardware failure,
>> so recovering from old OSDs is not possible.
>>
>> From your current understanding, is there a possibility to at least
>> recover some of the information, at least the fragments that are not
>> missing.
>>
>> I ask this because I tried to export the pool contents, but it gets stuck
>> (I/O blocked) because of "incomplete" PGs, maybe marking the PGs as
>> complete with ceph-objectstore-tool would be an option? Or using the
>> *dangerous* osd_find_best_info_ignore_history_les option for the affected
>> OSDs?
>>
>> Regards
>> Pablo
>>
>> On Mon, Jun 17, 2024 at 11:46 AM Matthias Grandl <
>> matthias.grandl@xxxxxxxx> wrote:
>>
>>> We are missing info here. Ceph status claims all OSDs are up. Did an OSD
>>> die and was it already removed from the CRUSH map? If so the only chance I
>>> see at preventing data loss is exporting the PGs off of that OSD and
>>> importing to another OSD. But yeah as David I am not too optimistic.
>>>
>>> Matthias Grandl
>>> Head Storage Engineer
>>> matthias.grandl@xxxxxxxx
>>>
>>> On 17. Jun 2024, at 17:26, cellosofia1@xxxxxxxxx wrote:
>>>
>>> 
>>> Hi everyone,
>>>
>>> Thanks for your kind responses
>>>
>>> I know the following is not the best scenario, but sadly I didn't have
>>> the opportunity of installing this cluster
>>>
>>> More information about the problem:
>>>
>>> * We use replicated pools
>>> * Replica 2, min replicas 1.
>>> * Ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy
>>> (stable)
>>> * Virtual Machines setup, 2 MGR Nodes, 2 OSD Nodes, 4 VMs in total.
>>> * 27 OSDs right now
>>> * Rook environment: rook: v1.9.5
>>> * Kubernetes Server Version: v1.24.1
>>>
>>> I attach a .txt with the result of some diagnostic commands for reference
>>>
>>> What do you think?
>>>
>>> Regards
>>> Pablo
>>>
>>> On Mon, Jun 17, 2024 at 11:01 AM Matthias Grandl <
>>> matthias.grandl@xxxxxxxx> wrote:
>>>
>>>> Ah scratch that, my first paragraph about replicated pools is actually
>>>> incorrect. If it’s a replicated pool and it shows incomplete, it means the
>>>> most recent copy of the PG is missing. So ideal would be to recover the PG
>>>> from dead OSDs in any case if possible.
>>>>
>>>> Matthias Grandl
>>>> Head Storage Engineer
>>>> matthias.grandl@xxxxxxxx
>>>>
>>>> > On 17. Jun 2024, at 16:56, Matthias Grandl <matthias.grandl@xxxxxxxx>
>>>> wrote:
>>>> >
>>>> > Hi Pablo,
>>>> >
>>>> > It depends. If it’s a replicated setup, it might be as easy as
>>>> marking dead OSDs as lost to get the PGs to recover. In that case it
>>>> basically just means that you are below the pools min_size.
>>>> >
>>>> > If it is an EC setup, it might be quite a bit more painful, depending
>>>> on what happened to the dead OSDs and whether they are at all recoverable.
>>>> >
>>>> >
>>>> > Matthias Grandl
>>>> > Head Storage Engineer
>>>> > matthias.grandl@xxxxxxxx
>>>> >
>>>> >> On 17. Jun 2024, at 16:46, David C. <david.casier@xxxxxxxx> wrote:
>>>> >>
>>>> >> Hi Pablo,
>>>> >>
>>>> >> Could you tell us a little more about how that happened?
>>>> >>
>>>> >> Do you have a min_size >= 2 (or E/C equivalent) ?
>>>> >> ________________________________________________________
>>>> >>
>>>> >> Cordialement,
>>>> >>
>>>> >> *David CASIER*
>>>> >>
>>>> >> ________________________________________________________
>>>> >>
>>>> >>
>>>> >>
>>>> >> Le lun. 17 juin 2024 à 16:26, cellosofia1@xxxxxxxxx <
>>>> cellosofia1@xxxxxxxxx>
>>>> >> a écrit :
>>>> >>
>>>> >>> Hi community!
>>>> >>>
>>>> >>> Recently we had a major outage in production and after running the
>>>> >>> automated ceph recovery, some PGs remain in "incomplete" state, and
>>>> IO
>>>> >>> operations are blocked.
>>>> >>>
>>>> >>> Searching in documentation, forums, and this mailing list archive, I
>>>> >>> haven't found yet if this means this data is recoverable or not. We
>>>> don't
>>>> >>> have any "unknown" objects or PGs, so I believe this is somehow an
>>>> >>> intermediate stage where we have to tell ceph which version of the
>>>> objects
>>>> >>> to recover from.
>>>> >>>
>>>> >>> We are willing to work with a Ceph Consultant Specialist, because
>>>> the data
>>>> >>> at stage is very critical, so if you're interested please let me
>>>> know
>>>> >>> off-list, to discuss the details.
>>>> >>>
>>>> >>> Thanks in advance
>>>> >>>
>>>> >>> Best Regards
>>>> >>> Pablo
>>>> >>> _______________________________________________
>>>> >>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>> >>>
>>>> >> _______________________________________________
>>>> >> ceph-users mailing list -- ceph-users@xxxxxxx
>>>> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>
>>> <diagnostic-commands-ceph.txt>
>>>
>>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx