Hi, I'm not sure if the repair waits for snaptrim; but it does need a scrub reservation on all the related OSDs, hence our script. And I've also observed that the repair req isn't queued up -- if the OSDs are busy with other scrubs, the repair req is forgotten. -- Dan On Wed, May 27, 2020 at 11:28 AM Daniel Aberger - Profihost AG <d.aberger@xxxxxxxxxxxx> wrote: > > Hi, > > (un)fortunately I can't test it because I managed to repair the pg. > > snaptrim and snaptrim_wait have been a part of this particular pg's > status. As I was trying to look deeper into the case I had a watch on > ceph health detail and noticed that snaptrim/snaptrim_wait was suddenly > not a part of the status anymore. > > So I gave it another try with ceph pg repair 18.19a and suddenly the > pg's status changed to active+clean+inconsistent+repair. It repaired > successfully. > > Is snaptrim somehow blocking repair instructions? I would have thought > that repair instructions will be queued up until they can be performed > but it does not seem to work as I expected it to. > > Anyway I'll keep your script in mind and give it a shot if it happens > again. Thank you :) > > Daniel > > Am 25.05.20 um 17:40 schrieb Dan van der Ster: > > Hi, > > > > Does this help? > > > > https://github.com/cernceph/ceph-scripts/blob/master/tools/scrubbing/autorepair.sh > > > > Cheers, Dan > > > > On Mon, May 25, 2020 at 5:18 PM Daniel Aberger - Profihost AG > > <d.aberger@xxxxxxxxxxxx> wrote: > >> > >> Hello, > >> > >> we are currently experiencing problems with ceph pg repair not working > >> on Ceph Nautilus 14.2.8. > >> > >> ceph health detail is showing us an inconsistent pg: > >> > >> [aaaaax-yyyy ~]# ceph health detail > >> HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent > >> OSD_SCRUB_ERRORS 1 scrub errors > >> PG_DAMAGED Possible data damage: 1 pg inconsistent > >> pg 18.19a is active+clean+inconsistent+snaptrim_wait, acting > >> [21,15,39,18,0,9] > >> > >> when we try to repair it, nothing happens. > >> > >> [aaaaax-yyyy ~]# ceph pg repair 18.19a > >> instructing pg 18.19as0 on osd.21 to repair > >> > >> There are no new entries in OSD 21's log file. > >> > >> We have no trouble repairing pgs in our other clusters so I assume it > >> might have to be something related to this cluster using Erasure > >> Codings. But this is just a wild guess. > >> > >> I found a similar problem in this mailing list - > >> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-April/026304.html > >> > >> Unfortunately the solution of waiting more than a week until it fixes > >> itself isn't quite satisfying. > >> > >> Is there anyone who has had similar issues and knows how to repair these > >> inconsistent pgs or what is causing the delay? > >> > >> > >> -- > >> Mit freundlichen Grüßen > >> Daniel Aberger > >> Ihr Profihost Team > >> > >> ------------------------------- > >> Profihost AG > >> Expo Plaza 1 > >> 30539 Hannover > >> Deutschland > >> > >> Tel.: +49 (511) 5151 8181 | Fax.: +49 (511) 5151 8282 > >> URL: http://www.profihost.com | E-Mail: info@xxxxxxxxxxxxx > >> > >> Sitz der Gesellschaft: Hannover, USt-IdNr. DE813460827 > >> Registergericht: Amtsgericht Hannover, Register-Nr.: HRB 202350 > >> Vorstand: Cristoph Bluhm, Sebastian Bluhm, Stefan Priebe > >> Aufsichtsrat: Prof. Dr. iur. Winfried Huck (Vorsitzender) > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx