Re: force-create-pg not working

Jesper Lykkegaard Karlsen <jelka@xxxxxxxxx> · Tue, 20 Sep 2022 14:08:17 +0000

Hi Josh, 

Thanks for your reply. 
But this I already tried that, with no luck. 
Primary OSD goes down and hangs forever, upon "mark_unfound_lost delete” command. 

I guess it is too damaged to salvage, unless one really starts deleting individual corrupt objects?

Anyway, as I said. files in the PG are identified and under backup, so I just want to healthy, no matter what ;-)

I actually discovered that removing the pgs shards, with objectstore-tool indeed works in getting the pg back active-clean (containing 0 objects though). 

One just need to run a final remove - start/stop OSD - repair - mark-complete on the primary OSD. 
A scrub tells me that the "active+clean” state  is for real.

I also found out the more automated "force-create-pg" command only works on pgs that a in down state. 

Best, 
Jesper  

--------------------------
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: jelka@xxxxxxxxx
Tlf:    +45 50906203

> On 20 Sep 2022, at 15.40, Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx> wrote:
> 
> Hi Jesper,
> 
> Given that the PG is marked recovery_unfound, I think you need to
> follow https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-pg/#unfound-objects.
> 
> Josh
> 
> On Tue, Sep 20, 2022 at 12:56 AM Jesper Lykkegaard Karlsen
> <jelka@xxxxxxxxx> wrote:
>> 
>> Dear all,
>> 
>> System: latest Octopus, 8+3 erasure Cephfs
>> 
>> I have a PG that has been driving me crazy.
>> It had gotten to a bad state after heavy backfilling, combined with OSD going down in turn.
>> 
>> State is:
>> 
>> active+recovery_unfound+undersized+degraded+remapped
>> 
>> I have tried repairing it with ceph-objectstore-tool, but no luck so far.
>> Given the time recovery takes this way and since data are under backup, I thought that I would do the "easy" approach instead and:
>> 
>>  *   scan pg_files with cephfs-data-scan
>>  *   delete data beloging to that pool
>>  *   recreate PG with "ceph osd force-create-pg"
>>  *   restore data
>> 
>> Although, this has shown not to be so easy after all.
>> 
>> ceph osd force-create-pg 20.13f --yes-i-really-mean-it
>> 
>> seems to be accepted well enough with "pg 20.13f now creating, ok", but then nothing happens.
>> Issuing the command again just gives a "pg 20.13f already creating" response.
>> 
>> If I restart the primary OSD, then the pending force-create-pg disappears.
>> 
>> I read that this could be due to crush map issue, but I have checked and that does not seem to be the case.
>> 
>> Would it, for instance, be possible to do the force-create-pg manually with something like this?:
>> 
>>  *   set nobackfill and norecovery
>>  *   delete the pgs shards one by one
>>  *   unset nobackfill and norecovery
>> 
>> 
>> Any idea on how to proceed from here is most welcome.
>> 
>> Thanks,
>> Jesper
>> 
>> 
>> --------------------------
>> Jesper Lykkegaard Karlsen
>> Scientific Computing
>> Centre for Structural Biology
>> Department of Molecular Biology and Genetics
>> Aarhus University
>> Universitetsbyen 81
>> 8000 Aarhus C
>> 
>> E-mail: jelka@xxxxxxxxx
>> Tlf:    +45 50906203
>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx