[no subject]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I ask this because I tried to export the pool contents, but it gets stuck
(I/O blocked) because of "incomplete" PGs, maybe marking the PGs as
complete with ceph-objectstore-tool would be an option? Or using the
*dangerous* osd_find_best_info_ignore_history_les option for the affected
OSDs?

Regards
Pablo

On Mon, Jun 17, 2024 at 11:46â?¯AM Matthias Grandl <matthias.grandl@xxxxxxxx>
wrote:

> We are missing info here. Ceph status claims all OSDs are up. Did an OSD
> die and was it already removed from the CRUSH map? If so the only chance I
> see at preventing data loss is exporting the PGs off of that OSD and
> importing to another OSD. But yeah as David I am not too optimistic.
>
> Matthias Grandl
> Head Storage Engineer
> matthias.grandl@xxxxxxxx
>
> On 17. Jun 2024, at 17:26, cellosofia1@xxxxxxxxx wrote:
>
> 
> Hi everyone,
>
> Thanks for your kind responses
>
> I know the following is not the best scenario, but sadly I didn't have the
> opportunity of installing this cluster
>
> More information about the problem:
>
> * We use replicated pools
> * Replica 2, min replicas 1.
> * Ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy
> (stable)
> * Virtual Machines setup, 2 MGR Nodes, 2 OSD Nodes, 4 VMs in total.
> * 27 OSDs right now
> * Rook environment: rook: v1.9.5
> * Kubernetes Server Version: v1.24.1
>
> I attach a .txt with the result of some diagnostic commands for reference
>
> What do you think?
>
> Regards
> Pablo
>
> On Mon, Jun 17, 2024 at 11:01â?¯AM Matthias Grandl <matthias.grandl@xxxxxxxx>
> wrote:
>
>> Ah scratch that, my first paragraph about replicated pools is actually
>> incorrect. If itâ??s a replicated pool and it shows incomplete, it means the
>> most recent copy of the PG is missing. So ideal would be to recover the PG
>> from dead OSDs in any case if possible.
>>
>> Matthias Grandl
>> Head Storage Engineer
>> matthias.grandl@xxxxxxxx
>>
>> > On 17. Jun 2024, at 16:56, Matthias Grandl <matthias.grandl@xxxxxxxx>
>> wrote:
>> >
>> > Hi Pablo,
>> >
>> > It depends. If itâ??s a replicated setup, it might be as easy as marking
>> dead OSDs as lost to get the PGs to recover. In that case it basically just
>> means that you are below the pools min_size.
>> >
>> > If it is an EC setup, it might be quite a bit more painful, depending
>> on what happened to the dead OSDs and whether they are at all recoverable.
>> >
>> >
>> > Matthias Grandl
>> > Head Storage Engineer
>> > matthias.grandl@xxxxxxxx
>> >
>> >> On 17. Jun 2024, at 16:46, David C. <david.casier@xxxxxxxx> wrote:
>> >>
>> >> Hi Pablo,
>> >>
>> >> Could you tell us a little more about how that happened?
>> >>
>> >> Do you have a min_size >= 2 (or E/C equivalent) ?
>> >> ________________________________________________________
>> >>
>> >> Cordialement,
>> >>
>> >> *David CASIER*
>> >>
>> >> ________________________________________________________
>> >>
>> >>
>> >>
>> >> Le lun. 17 juin 2024 à 16:26, cellosofia1@xxxxxxxxx <
>> cellosofia1@xxxxxxxxx>
>> >> a écrit :
>> >>
>> >>> Hi community!
>> >>>
>> >>> Recently we had a major outage in production and after running the
>> >>> automated ceph recovery, some PGs remain in "incomplete" state, and IO
>> >>> operations are blocked.
>> >>>
>> >>> Searching in documentation, forums, and this mailing list archive, I
>> >>> haven't found yet if this means this data is recoverable or not. We
>> don't
>> >>> have any "unknown" objects or PGs, so I believe this is somehow an
>> >>> intermediate stage where we have to tell ceph which version of the
>> objects
>> >>> to recover from.
>> >>>
>> >>> We are willing to work with a Ceph Consultant Specialist, because the
>> data
>> >>> at stage is very critical, so if you're interested please let me know
>> >>> off-list, to discuss the details.
>> >>>
>> >>> Thanks in advance
>> >>>
>> >>> Best Regards
>> >>> Pablo
>> >>> _______________________________________________
>> >>> ceph-users mailing list -- ceph-users@xxxxxxx
>> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >>>
>> >> _______________________________________________
>> >> ceph-users mailing list -- ceph-users@xxxxxxx
>> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
> <diagnostic-commands-ceph.txt>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux