Re: 1 PG stucked in "active+undersized+degraded for long time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I can provide some more details, these were the recovery steps taken so far, they started from here (I don't know the whole/exact story though):

  70/868386704 objects unfound (0.000%)
  Reduced data availability: 8 pgs inactive, 8 pgs incomplete
  Possible data damage: 1 pg recovery_unfound
Degraded data redundancy: 45558/8766139136 objects degraded (0.001%), 2 pgs degraded, 1 pg undersized

And with reducing min_size for the EC pools some of the inactive PGs were cleaned up. From the remaining 4 incomplete PGs they got further by marking them unfound_lost:

# ceph pg 15.f4f mark_unfound_lost delete
pg has 70 objects unfound and apparently lost marking

And now one PG is stuck degraded:

# ceph pg ls degraded
PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP 15.28f0 44994 44994 0 0 55288092914 0 0 3077 active+undersized+degraded 93s 310625'599302 310657:3603406 [2147483647,343,355,415,426,640,302,392,78,202,607]p343 [2147483647,343,355,415,426,640,302,392,78,202,607]p343 2021-04-11 03:18:39.164439 2021-04-10 01:42:16.182528

Setting osd.343 down didn't have any effect, I then suggested to increase set_choose_retries from 100 to 150 for the respective crush_rule (found a thread where that seemed to have helped), don't have a response to that yet. If nothing else helps, would it help marking the PG as unfound_lost (with data loss) help here?

Zitat von Anthony D'Atri <anthony.datri@xxxxxxxxx>:

Sometimes one can even get away with "ceph osd down 343" which doesn't affect the process. I have had occasions when this goosed peering in a less-intrusive way. I believe it just marks the OSD down in the mons' map, and when that makes it to the OSD, the OSD responds with "I'm not dead yet" and gets marked up again.

On Jul 20, 2023, at 13:50, Matthew Leonard (BLOOMBERG/ 120 PARK) <mleonard33@xxxxxxxxxxxxx> wrote:

Assuming you're running systemctl OSDs you can run the following command on the host that OSD 343 resides on.

systemctl restart ceph-osd@343

From: siddhit.renake@xxxxxxxxxx At: 07/20/23 13:44:36 UTC-4:00To: ceph-users@xxxxxxx Subject: Re: 1 PG stucked in "active+undersized+degraded for long time

What should be appropriate way to restart primary OSD in this case (343) ?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux