Hi,
your cli output is barely readable, although it probably is not that
relevant here. Apparently, it's an EC pool you're referring to? A pg
repair tries to repair inconsistent objects, see [1] for more details.
I don't really know how to explain "repeer", I'm also not a dev, so
maybe someone from the ceph team can explain it better. But from how I
understand it, a temporary new mapping for the primary OSD is created
and that would trigger something like a refresh, I guess:
// map to just primary; it will map back to what it wants
pending_inc.new_pg_temp[pgid] = { primary }
So this doesn't really affect your PG content wise, I think, it's just
a refresh. Why your PGs get stuck (is it always the same? Is a
specific OSD involved in all of the cases?) is difficut to answer
without knowing so little about your cluster. Is the overall cluster
health status okay? Are your OSDs (slow or fast drives?) highly
utilized?
[1]
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/#more-information-on-pg-repair
Zitat von 苏察哈尔灿 <2644294460@xxxxxx>:
My ceph cluster sometimes gets stuck in the active+clean+snaptrim
state when doing regular snapshot deletion, and the corresponding pg
does not change for a long time. As follows:
27.7c 14350
0 0
0 38073876582
0 0 2864
3000
active+clean+snaptrim 9h 43777'677286
43777:1018248 [52,55,20,36,29,91,9,63,14,2]p52
[52,55,20,36,29,91,9,63,14,2]p52
2024-07-21T13:57:12.602903+0000
2024-07-20T11:02:06.953985+0000
52 queued for scrub
27.c1 14055
0 0
0 37408096205
0 0 2754
3000
active+clean+snaptrim 9h 43777'676887
43777:952875
[0,21,39,62,26,58,41,86,66,2]p0
[0,21,39,62,26,58,41,86,66,2]p0
2024-07-22T02:26:43.470883+0000
2024-07-20T05:09:23.763918+0000
44 periodic scrub scheduled
@ 2024-07-23T07:46:50.084316+0000
27.19b 14711 0
0 0
38926389125 0
0 2765
3000 active+clean+snaptrim
9h 43777'698478 43777:849574
[5,7,29,14,61,10,22,0,19,37]p5
[5,7,29,14,61,10,22,0,19,37]p5
2024-07-22T03:28:39.036189+0000
2024-07-17T14:00:43.082766+0000
40 periodic scrub scheduled
@ 2024-07-23T12:30:57.488823+0000
27.1a3 14323 0
0 0
37918899397 0
0 2930
3000 active+clean+snaptrim
9h 43777'675713 43777:943324
[52,4,10,46,20,69,49,39,44,30]p52
[52,4,10,46,20,69,49,39,44,30]p52
2024-07-21T11:15:16.442506+0000
2024-07-20T04:10:45.391550+0000
50 queued for scrub
No change for 9 hours. Then, after I typed the command "ceph pg
repeer 27.7c", the corresponding pg state was restored to normal. I
don't know what this "repeer" command is for, will it have any
effect on pg? What's the difference between "repeer" and "ceph pg
repair"? Then, why does the pg get stuck so often? Thanks for your
help!
My ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2)
quincy (stable)
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx