Pablo, Since some PGs are empty and all OSDs are enabled, I'm not optimistic about the future at all. Was the command "ceph osd force-create-pg" executed with missing OSDs ? Le lun. 17 juin 2024 à 17:26, cellosofia1@xxxxxxxxx <cellosofia1@xxxxxxxxx> a écrit : > Hi everyone, > > Thanks for your kind responses > > I know the following is not the best scenario, but sadly I didn't have the > opportunity of installing this cluster > > More information about the problem: > > * We use replicated pools > * Replica 2, min replicas 1. > * Ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy > (stable) > * Virtual Machines setup, 2 MGR Nodes, 2 OSD Nodes, 4 VMs in total. > * 27 OSDs right now > * Rook environment: rook: v1.9.5 > * Kubernetes Server Version: v1.24.1 > > I attach a .txt with the result of some diagnostic commands for reference > > What do you think? > > Regards > Pablo > > On Mon, Jun 17, 2024 at 11:01 AM Matthias Grandl <matthias.grandl@xxxxxxxx> > wrote: > >> Ah scratch that, my first paragraph about replicated pools is actually >> incorrect. If it’s a replicated pool and it shows incomplete, it means the >> most recent copy of the PG is missing. So ideal would be to recover the PG >> from dead OSDs in any case if possible. >> >> Matthias Grandl >> Head Storage Engineer >> matthias.grandl@xxxxxxxx >> >> > On 17. Jun 2024, at 16:56, Matthias Grandl <matthias.grandl@xxxxxxxx> >> wrote: >> > >> > Hi Pablo, >> > >> > It depends. If it’s a replicated setup, it might be as easy as marking >> dead OSDs as lost to get the PGs to recover. In that case it basically just >> means that you are below the pools min_size. >> > >> > If it is an EC setup, it might be quite a bit more painful, depending >> on what happened to the dead OSDs and whether they are at all recoverable. >> > >> > >> > Matthias Grandl >> > Head Storage Engineer >> > matthias.grandl@xxxxxxxx >> > >> >> On 17. Jun 2024, at 16:46, David C. <david.casier@xxxxxxxx> wrote: >> >> >> >> Hi Pablo, >> >> >> >> Could you tell us a little more about how that happened? >> >> >> >> Do you have a min_size >= 2 (or E/C equivalent) ? >> >> ________________________________________________________ >> >> >> >> Cordialement, >> >> >> >> *David CASIER* >> >> >> >> ________________________________________________________ >> >> >> >> >> >> >> >> Le lun. 17 juin 2024 à 16:26, cellosofia1@xxxxxxxxx < >> cellosofia1@xxxxxxxxx> >> >> a écrit : >> >> >> >>> Hi community! >> >>> >> >>> Recently we had a major outage in production and after running the >> >>> automated ceph recovery, some PGs remain in "incomplete" state, and IO >> >>> operations are blocked. >> >>> >> >>> Searching in documentation, forums, and this mailing list archive, I >> >>> haven't found yet if this means this data is recoverable or not. We >> don't >> >>> have any "unknown" objects or PGs, so I believe this is somehow an >> >>> intermediate stage where we have to tell ceph which version of the >> objects >> >>> to recover from. >> >>> >> >>> We are willing to work with a Ceph Consultant Specialist, because the >> data >> >>> at stage is very critical, so if you're interested please let me know >> >>> off-list, to discuss the details. >> >>> >> >>> Thanks in advance >> >>> >> >>> Best Regards >> >>> Pablo >> >>> _______________________________________________ >> >>> ceph-users mailing list -- ceph-users@xxxxxxx >> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> >>> >> >> _______________________________________________ >> >> ceph-users mailing list -- ceph-users@xxxxxxx >> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx